A Pilot Genome-wide Association Study of Breast Cancer Susceptibility Loci in Indonesia

Genome-wide association studies (GWASs) of the entire genome provide a systematic approach for revealing novel genetic susceptibility loci for breast cancer. However, genetic association studies have hitherto been primarily conducted in women of European ancestry. Therefofre we here performed a pilot GWAS with a single nucleotide polymorphism (SNP) array 5.0 platform from Affymetrix® that contains 443,813 SNPs to search for new genetic risk factors in 89 breast cancer cases and 46 healthy women of Indonesian ancestry. The case-control association of the GWAS finding set was evaluated using PLINK. The strengths of allelic and genotypic associations were assessed using logistic regression analysis and reported as odds ratios (ORs) and P values; P values less than 1.00x10 -8 and 5.00x10 -5 were required for significant association and suggestive association, respectively. After analyzing 292,887 SNPs, we recognized 11 chromosome loci that possessed suggestive associations with breast cancer risk. Of these, however, there were only four chromosome loci with identified genes: chromosome 2p.12 with the CTNNA2 gene [Odds ratio (OR)=1.20, 95% confidence interval (CI)=1.13-1.33, P =1.08x10 -7 ]; chromosome 18p11.2 with the SOGA2 gene (OR=1.32, 95%CI=1.17–1.44, P =6.88x10 -6 ); chromosome 5q14.1 with the SSBP2 gene (OR=1.22, 95%CI=1.11–1.34, P =4.00x10 -5 ); and chromosome 9q31.1 with the TEX10 gene (OR=1.24, 95%CI=1.12–1.35, P =4.68x10 -5 ). This study identified 11 chromosome loci which exhibited suggestive associations with the risk of breast cancer among Indonesian women.


Introduction
Breast cancer is one of the most common malignancies affecting women worldwide, including those living in Indonesia (Balmain et al., 2003;Kim et al., 2012;Shu et al., 2012). Some research reported that breast cancer was the second most frequently diagnosed cancer and the second leading cause of cancer death, after cervical cancer, among Indonesian women. The incidence of breast cancer in Indonesia tends to increase and it is expected that breast cancer will be the most common found of cancer cases (Tjindarbumi and Mangunkusumo, 2002;Ng et al., 2011).
The etiology of breast cancer is exceptionally complex and appears to implicate numerous factors, such as genetic, endocrine, and external or environmental factors Iodice et al., 2010;Phipps et al., 2011;Kim et al., 2012;Islam et al., 2013). Genetic factors play an important role in the etiology of breast cancer (Nathanson et al., 2001;Balmain et al., 2003). Recently, genome wide association study (GWAS) has generated a systematic way or technique to identify genetic variants for breast cancer, and the most common type of genetic variation is single nucleotide polymorphism (SNP). In this study, we defined genetic variants as chromosome loci that had association with breast cancer risk. Heretofore, approximately 20 susceptibility loci have been identified in GWAS and associated with breast cancer risk (Easton et al., 2007;Hunter et al., 2007;Stacey et al., 2007;Gold et al., 2008;Stacey et al., 2008;Ahmed et al., 2009;Thomas et al., 2009;Antoniou et al., 2010;Gaudet et al., 2010;Long et al., 2010;Turnbull et al., 2010;Fletcher et al., 2011;Zhang et al., 2011;Mahdi et al., 2013).
Most of published GWASs were conducted among women of European ancestry and to a lesser extent in Asians. Therefore, it is important to evaluate whether these variants grant risk across different ancestry backgrounds. Furthermore, some research revealed that many of these variants discovered in European-ancestry populations showed simply a poor or no association with breast cancer in different ethnic groups Zheng et al., 2010;Guan et al., 2014). Hence, there is a necessity to conduct GWAS in non-European women, especially among Indonesian women, to disclose the genetic basis of breast cancer susceptibility.

Study population
In this study we recruited a total of 135 women, consisting of 89 breast cancer patients and 46 healthy controls. Both groups, control and breast cancer patients, were selected from the same population living in Jakarta and involving only subjects that have no relation. The sporadic breast cancer patients were recruited from The National Cancer Center of Dharmais Hospital (Jakarta -Indonesia) between 2008 and 2012. The information related to the history of patients' disease was acquired from their medical records. All patients recruited in this study evidently had primary breast cancer, with unilateral breast tumors. The diagnosis of cancer was confirmed by histopathological examination. The clinic-pathological characteristics of the patients in this study are summarized in Table 1. For controls, we included healthy voulunteers who had no personal or family history of cancer or other chronic diseases. The collection of blood samples from all subjects were conducted consecutively between 2008 and 2012. There was a matching process based on age between breast cancer patient and control group.
All subjects who participated in this study provided written inform consent to participate and to permit that their biological samples would be genetically analyzed. This study protocol was approved by the ethical committees of The National Cancer Center of Dharmais Hospital, Jakarta-Indonesia.

Genomic DNA purification
Approximately 5 mL of peripheral blood was taken from all subjects and transferred into EDTA-coated tube to prevent coagulation. By using DNA isolation kit from QIAGEN®, we performed genomic DNA purification. DNA concentration and quality were analyzed by nanodrop 2000®. In this study, we only used DNA with concentration more than 100 ng/µL. The ratio of absorbance at 260 nm and 280 nm was used to assess the purity of DNA; a ratio between 1.7-1.8 is generally accepted as DNA. In addition, we also used agarose gel electrophoresis to determine whether the DNA was fragmented or not. The fragmented DNA was characterized by the formation of smeared DNA bands; the unfragmented DNA, on the other hand, would provide a single band in agarose gel. In our study, we only used unfragmented DNA.

Genotyping and quality control
The initial 135 subjects were genotyped using Genome-Wide Human SNP Array 5.0 platform (Affymetrix®) that contains a total of 443,813 SNPs. The sex of all study samples was confirmed to be female. After a standard SNP quality control which excluded SNPs with: Minor Allele Frequency (MAF) < 5%, call rate of < 98%, those that deviated from the Hardy-Weinberg Equilibrium (P≤1.00x10 -6 ), those on the X chromosome and nonpolymorphic SNPs; a total of 292,887 SNPs were used for further analysis. Furthermore, for sample quality control, we assessed cryptic relatedness for each sample with identity-by-state method. In order to examine population stratification of this study, we carried out principal component analysis (PCA) using EIGENSTRAT software v5.0. As reference, we used four input genotypes from the HapMap populations: Yoruba form Ibadan, YRI; Caucasian from Utah, CEU; Japanese from Tokyo, JPT; and Chinese from Beijing, CHB. By applying the top two associated principal components (eigenvectors), the scatter plot was plotted in order to identify outliers who did not belong to the JPT/CHB cluster. Afterwards, we carried out PCA analysis using only the genotype data from the case and control subjects to assess the population substructure. We, subsequently, examined the potential population stratification of the study subjects by developing quantile-quantile (Q-Q) plot using observed P values against expected P values and inflation factor value (λ value) that was computed.

Statistical analyses
The case-control association of the GWAS finding set was evaluated using PLINK version 1.07 software. The strengths of allelic and genotypic associations were assessed using logistic regression analysis and reported as odds ratio (OR) and P value; P values of less than 1.00x10 -8 and 5.00x10 -5 were required for significant association and suggestive association, respectively. To have an overview of the association of chromosome position with breast cancer, a Manhattan plot of the study was plotted using Haploview 4.1.

Results
A total of 89 female breast cancer cases and 46 female healthy controls were included in this study (Table 1). The age of the cases and controls ranged from 31-61 years and 31-74 years, respectively. The mean age at diagnosis for breast cancer cases was 47.81 ± SD 7.21 years, with no significant difference between case and control group (P=0.966). All subjects with breast cancer were negative for mutations in BRCA1 and BRCA2 genes. Table 1 also presents the quality control indices of the DNA samples which covered DNA ratio 260/280 and DNA concentration.
All subjects were genotyped using Genome-Wide Human SNP Array 5.0 platform (Affymetrix®) that contains 443,813 SNPs to identify genetic variants associated with the susceptibility of breast cancer in Indonesian women. Among 443,813 genotyped SNPs; 31,344 SNPs were dropped from analysis because of call rate < 98% and analysis was continued on the remaining 412,469 SNPs. A total of 116,590 SNPs were excluded because of MAF<5%. After marker quality control with Hardy-Weinberg Equilibrium (P≤1.00x10 -6 ), 18 SNPs were dropped for further analysis. Finally, a total of 292,887 autosomal SNPs were evaluated for its association with the risk of breast cancer by logistic regression analysis.
Furthermore, we explored the population structure by calculating inflation factor λ using all 292,887 SNPs that passed the quality control. In our GWAS study, we observed no evidence of inflation of the test statistic (λ = 1.04), suggesting that any population substructure (if    We carried out PCA in order to exclude the possibility of population substructure in our sample. The Manhattan plot, which is plotting -log10 (P values) from the GWAS and imputation analysis towards the chromosome position, shows that there were no chromosome loci achieving the standard level of genome-wide significance (P<1.00x10 -8 ). Eleven chromosome loci, however, showed suggestive association (P<5.00x10 -5 ) with breast cancer risk (Figure 1). The most statistically significant association was observed in locus 2p.12 [Odds ratio (OR)=1.20, 95% confidence interval (CI)=1.13-1.33, P=1.08x10 -7 ] in the intron of CTNNA2 (cadherinassociated protein) gene (Table 2).

Discussion
Genome-wide studies of common and rare genetic variations carried out in multiple populations would be required to disclose the spectrum of susceptibility alleles that contribute to the risk of breast cancer. We performed GWAS including a total of 89 breast cancer cases and 46 controls to investigate the involvement of common genetic variants associated with breast cancer risk among Indonesian women. In our study, we have successfully identified the association of 11 chromosome loci that had suggestive association (P<5.00x10 -5 ) with breast cancer risk in Indonesia. Of these 11 chromosome loci, on the other hand, there were only four loci with identified genes: 2p.12 of CTNNA2 gene; 18p11.2 of SOGA2 gene; 5q14.1 of SSBP2 gene; and 9q31.1 of TEX10 gene.
Locus 2p.12 was located in intron variant of catenin (cadherin-associated protein) alpha 2 (CTNNA2) gene (The National Center for Biotechnology Information, 2014). The main function of CTNNA2 gene is as a linker between cadherin adhesion receptors and the cytoskeleton in order to manage cell-cell adhesion and differentiation in the nervous system. It is worth pointing out that, during development, CTNNA2 gene also regulates morphological plasticity of synapses and cerebellar and hippocampal lamination (Crown Human Genome Center, 2014; The National Center for Biotechnology Information, 2014).
Furthermore, this gene also encodes catenin alpha-2 (α-catenin), a 102 kDa protein that consists of 860 amino acids. This protein plays a significant role in cancer development through several mechanisms or pathways. Firstly, in cytoplasm, α-catenin binds to catenin beta-1 (β-catenin) and E-cadherin (ECAD) and the interaction of these three proteins would induce tissue invasion and metastasis. Secondly, the binding of α-catenin, β-catenin, and ECAD activates lymphoid enhancer-binding factor-1 (LEF-1). The activation of LEF-1 stimulates survivin which has a critical role in evading apoptosis. Furthermore, LEF-1 also possesses an important part in proliferation by activating c-Myc and cyclin D1 (The National Center for Biotechnology Information, 2014).
The role of CTNNA2 gene in breast cancer development has not been reported. However, there is a study which was conducted by Fanjul-Fernandez et al. (2013) in order to investigate the presence of mutation of CTNNA2 gene in laryngeal carcinomas. Based on their study, they concluded that CTTNA2 was a tumor suppressor gene that often mutated in laryngeal carcinomas.
The second significantly associated locus was 18p11.2 of suppressor of glucose autophagy associated 2 (SOGA2) gene which is also known as microtubule crosslinking factor 1 (MTCL1) gene. The role of this gene in the development of cancer, on the other hand, is still unknown (The National Center for Biotechnology Information, 2014).
Another locus, 5q14.1, is located in single-stranded DNA binding protein 2 (SSBP2) gene. This gene is a subunit of a single-stranded DNA (ssDNA)-binding complex which is involved in the maintenance of genome stability (The National Center for Biotechnology  Information, 2014). Until now, the study that supports the role of SSBP2 gene in breast cancer is still unclear. Nevertheless, there were some investigantions which reported the part of SSBP2 gene in other cancer cases, such as glioblastoma and acute myelogenous leukemia. Based on their research, Xiao et al. (2012) concluded that there was a statistically significant association between the increased of tumor expression of SSBP2 and the poorer overall survival of glioblastoma pastients. Moreover, based on their research which focused on acute myelogenous leukemia, Liang et al. (2005) reported that SSBP2 gene was a novel regulator of hematopoietic growth and differentiation.
The last locus was 9q31.1 of testis-expressed sequence 10 (TEX10). The function of this gene is as a component of the Five Friends of Methylated SCHTOP (5FMC) complex (Crown Human Genome Center, 2014; The National Center for Biotechnology Information, 2014). In common with three other genes that have discussed above, the role of TEX10 gene in breast cancer is still indistinct. There is, however, a research that implied that TEX10 is a novel cancer-related protein and there is a possibility that this protein could be used as a marker prognosis/ diagnosis of head and neck squamous cell carcinoma (Yoshitake et al., 2012).
One of GWAS studies in Asia was conducted by Guan et al. (2014) and included 1009 Chinese females (487 breast cancer patients and 522 healthy controls). As a comparison, Guan and coworkers carried out a study to verify 10 susceptibility SNPs that had been reported in European ancestry. Among the 10 SNPs that were genotyped, only one SNP (rs10941679) showed significant association with breast cancer risk. This result supports our statement in "the introduction" that the genetic variants discovered in European-ancestry populations showed simply a poor or no association with breast cancer in Asia.
In this study, we realized that we had many limitations. The main weakness of this research was about the total number of participants, we only included a total of 135 subjects (89 breast cancer cases and 46 healthy controls). Concerning to our sample size we were certainly aware that the recognized suggestive chromosome loci still require further validation study with larger number of subject of Indonesian women as well as of other populations. Therefore, there is a need of cooperation among local institutions and international organizations as well to investigate more common genetic variants related to breast cancer risk.
In summary, our pilot genome-wide association study has identified 11 chromosome loci that owned suggestive association with breast cancer risk among women of Indonesian ancestry. These results have not been reported in other GWASs of other populations.