Massive Parallel Sequencing for Diagnostic Genetic Testing of BRCA Genes-a Single Center Experience

Mutations in the BRCA1 and BRCA2 genes increase the risk of HBOC. The detection of BRCA1/2 mutations is commonly performed by Sanger sequencing. However, the large sizes of BRCA1/2 (5592 bp and 10,257 bp, respectively) make this procedure too expensive, and the high cost of testing interferes with the widespread use of this test in the clinic. To date, the analysis of the BRCA genes is assigned primarily to patients with severe family history of breast and ovarian cancer (Euhus et al., 2002). However, almost 50% of HBOC patients are carriers of inherited mutations of BRCA1 or BRCA2 and have no family history of the disease (King et al., 2003). These include, for instance, patients who have inherited mutations paternally and have no female relatives in the paternal line. Currently, these patients are rarely directed to the genetic testing of the BRCA genes. Until recently, the diagnosis of BRCA1/2 mutations was essential mainly for the estimation of the individual risk of HBOC and to plan preventive diagnostics. In September 2013, the Astra Zeneca Company initiated the third stage of clinical trials of an inhibitor of the Poly (ADP-ribose) polymerase (PARP) (olaparib), which


Introduction
Mutations in the BRCA1 and BRCA2 genes increase the risk of HBOC. The detection of BRCA1/2 mutations is commonly performed by Sanger sequencing. However, the large sizes of BRCA1/2 (5592 bp and 10,257 bp, respectively) make this procedure too expensive, and the high cost of testing interferes with the widespread use of this test in the clinic. To date, the analysis of the BRCA genes is assigned primarily to patients with severe family history of breast and ovarian cancer (Euhus et al., 2002). However, almost 50% of HBOC patients are carriers of inherited mutations of BRCA1 or BRCA2 and have no family history of the disease (King et al., 2003). These include, for instance, patients who have inherited mutations paternally and have no female relatives in the paternal line. Currently, these patients are rarely directed to the genetic testing of the BRCA genes.
Until recently, the diagnosis of BRCA1/2 mutations was essential mainly for the estimation of the individual risk of HBOC and to plan preventive diagnostics. In September 2013, the Astra Zeneca Company initiated the third stage of clinical trials of an inhibitor of the Poly (ADP-ribose) polymerase (PARP) (olaparib), which 1 influences the tumor cells with defective BRCA1 or BRCA2. With the advent of a new drug, BRCA1/2 testing became a powerful predictive tool for targeted therapies. The development of a cost-effective method of BRCA1/2 testing opens the possibility of extending this treatment to a wider class of patients.
Massive parallel sequencing (MPS) is a technology based on the simultaneous sequencing of spatially separated DNA molecules. MPS-sequencing productivity reaches 10 gb and is characterized by its low cost per nucleotide. Thus, the use of MPS for resequencing of BRCA1/2 in pooled samples significantly reduces the cost of the analysis of mutations in the BRCA genes. The sequencing of the BRCA genes using MPS has been already performed using the platforms 454 FLX (Leeneer et al., 2011), GS Junior (Roche) (Feliubadalo et al., 2013), Genome Analyzer (Illumina) (Morgan et al., 2010;Walsh et al., 2010), SOLiD System, Ion PGM/Ion Proton (Invitrogen) (Chan et al., 2012), and HeliScope (Helicos BioSciences) (Thompson et al., 2012).
Taking into account the recommendations of earlier studies, we have developed a workflow for the testing of BRCA genes using the MPS platform MiSeq (Illumina) adapted to the diagnostic laboratory setting. The workflow presented herein requires low amounts of DNA samples and is cost-effective due to the elimination of the laborious DNA and PCR product normalization steps. In the present study, we describe our use of the MPS method in the routine diagnostics of BRCA mutations in 96 HBOC patients under current observation in the Altai Krai Oncological Clinic.

Study participants
The participants were 96 patients with primary invasive breast cancer with one of the following hereditary cancer criteria: (a) at least two cases of breast cancer before age 50 in the family; (b) at least three cases of breast cancer in the family; (c) bilateral breast cancer; (d) male cancer; cases of breast or ovarian cancer in the family; or (e) early breast cancer (before age 40). The family histories and ages of the patients are listed in Table  1. The study participants were diagnosed in the Altai Krai Oncological Clinic between 2012 and 2014. During that period, all patients with breast cancer were tested for hotspot mutations in BRCA1 (5382insC, C61G, 4154delA, 185delAG, 2080delA, 3819del5, 3875del4) andBRCA2 (6174delT, 9318del4, 1528del4). For the study, patients without hot-spot BRCA1/2 mutations were selected. Written informed consent was obtained from all study participants.

DNA extraction
DNA was obtained from the following different sample types: EDTA-treated peripheral blood leukocytes (n=75); whole blood drops dried on paper (hereinafter "blood print," n=14); and buccal wash epithelia (n=7). DNA was extracted from the peripheral blood leukocytes using an in-house method comprising a cell lysis step using with 10% SDS, proteinase K treatment, protein extraction using phenol-chloroform, and ethanol precipitation of the DNA. DNA was extracted from the blood prints and buccal wash samples using the QIAamp ® DNA Blood Mini Kit (Qiagen). All DNA samples were quantified using PicoGreen (Promega). There were approximately 25 variations in the sample amounts. We did not equalized the DNA concentration for subsequent PCR.

PCR-based target amplification
We developed primer sets covering the entirety of the BRCA1 and BRCA2 coding regions and splice sites (at least 15 nucleotides). The gene-specific primers contained a universal 5'-end tail (forward primer: 5'-acacgacgctcttccgatct-3' and reverse primer: 5'-gacgtgtgctcttccgatct-3'). The primer structures are available upon request. Per sample, 86 singleplex PCRs with gene-specific primers were performed. PCRs were performed in a total volume of 16 ml. The amplification mixture included 10 мМ TrisHCl (pH 8.9), 2.5 мМ MgCl2, 55 мМ KCl, 200 mM of each dNTP, 1.25 нМ Syto13, 300 nM of forward and reverse primers, 0.5 U AmpliTaq Gold (Life Technologies), and 2-50 ng of DNA. The temperature cycling protocol consisted of the following steps: 12 min at 95°C, 30 cycles of denaturation at 95°C for 10 sec, annealing at 60°C for 10 sec, extension at 72°C during 50 sec, and final extension at 72°for 2 min. PCRs were performed on a CFX384 instrument (Bio-Rad, Hercules, CA). All target PCR products were pooled and purified with magnetic beads. Amount of PCR products was evaluated by Endpoint Fluorescence (EF). Variations between EF value were less 5. We considered such variations are not significant for MPS and thus we did not normalize the samples before pooling. Aliquots of the pooled samples were diluted 100 times and 2 mkl product was used as a template for a second PCR. The second round of PCR was performed using a pair of barcoded primers (30 nM) consisting of a 3'-end universal tail, an 8-nucleotide barcode sequence, a 5'end bridge adaptor, and a pair of common primers (300 nM) corresponding to the 5'-end adaptor of the barcoded primers. All sequences were provided by Illumina (Diego et al., 2014). The temperature cycling protocol consisted of the following steps: 12 min at 95°C, 15 cycles of denaturation at 95°C for 10 sec, annealing at 60°C for 10 sec, extension at 72°C for 50 sec, and a final extension at 72°C for 2 min. After the second PCR, all 96 patientspecific products were pooled and purified using magnetic beads. DNA concentration of pooled sample was evaluated using q-PCR with primers 5'-aatgatacggcgaccaccga-3', 5'-caagcagaagacggcatacga-3' and TaqMan probe 5'-FAMtccctacacgacgctcttccg-FQ-3' using PhiX serial dilutions as standard samples.
To determine the best pipeline, we varied the following parameters: SNP and InDel calling programs (Samtools,

Results
Using the MPS approach, we sequenced the BRCA1 and BRCA2 genes in 96 patients. Of these, 16 random samples were characterized with conventional Sanger sequencing and were used to assess the sensitivity and specificity of the workflow presented in this study.

Overall Quality, Tag Sorting, Mapping Reads and Coverage of Sequencing
The BRCA1/2 exons and adjacent splice sites (approximately 40 but not less than 15 nucleotides) were Asian Pacific Journal of Cancer Prevention, Vol 16, 2015 7937 DOI:http://dx.doi.org/10.7314/APJCP.2015.16.17.7935 Massive Parallel Sequencing for Diagnostic Genetic Testing of BRCA Genes -a Single Center Experience resequenced by the MiSeq platform. The total number of read pairs was 8,282,483 and the base-call quality ranged from 24 to 39, which corresponded to a maximum of 0.4% probability of a wrong base call. Thus, the quality filters were not applied. Only 70.7% (5854421) of the reads contained barcodes without variants in the nucleotide sequence. The rest of the barcodes contained one (13.22%), two (3.04%) or more variants, or corresponded to different reference sequences (13.06%). Only pairs of reads bearing barcodes without variants were selected for further analysis.
The reads were aligned against the BRCA1 (NC_000017.10) and BRCA2 (NC_000013.10) reference sequences. A high mapping rate (99.2±0.1%) was attained for all samples. The mean coverage of each PCR locus and sample was evaluated. Coverage was high enough (>100×) in all cases. The mean amplicon coverage was 1012 (182-2269) reads. The coverage for various samples ranged from 240 to 4652 reads (on average, 1019 reads). All DNA samples with the lowest coverage were extracted from the blood prints. The distributions of coverage across the amplicons and samples are shown in Figure 1.

Variant calling
We formed a training set from 16 samples to optimize the variant calling process and assess the accuracy of the MPS method using Sanger sequencing as a reference standard. In total, the training set contained 68 polymorphic variants in BRCA1, including 11 different single nucleotide variants (SNV) and one deletion.
We investigated the identification of the BRCA1 variants using the GATK tool. 103 polymorphic variants of the BRCA1 genes (92 SNV and 11 deletions) were identified after alignment and raw variant calling. Of    (FP). To reduce the number of FP variants, we tested some additional filters: (i) the variants coverage below 20× or 50×; and (ii) the frequency of reads with variant alleles (VF) below 10-40%. We also compared the three variant calling tools (FreeBayes, SAMtools and GATK). The results of applying these software and the different variant calling filters are shown in Table 2. The optimal tools and filter thresholds were selected to achieve the minimum number of FP variants and the minimum number of false negative variants (FN, i.e., lost true positive variants). In this way, the maximum sensitivity [TP/(TP + FN)] was reached (TP denotes a true positive). The best results were achieved using the GATK tool and application of coverage <20× and VF <14%. Thus, the sensitivity was 100%. The specificity was calculated as a fraction of TP among all positives [TP/ (TP+FP)]. This so-called positive predictive value (PPV) gives a more informative value than standard specificity [TN/(TN + FP)] (Zvelebil et al., 2007). Hence, the variant calling specificity was 94.4%.
Next, the variants of all 96 samples were identified. As determined by the analysis of the training set, only the variants with a combined coverage of at least 20 reads and at least 14% VF were considered. In total, we identified 977 variants in BRCA1 and BRCA2 (46 unique SNV and nine indel variants). All identified mutations were tested using Sanger sequencing. As a result, three SNV and six indel variants were not confirmed. We analyzed all FP variants, finding that all three false positive SNVs were directly adjoined to the 3'-end of one of the primers of the first PCR and were detected in only one strand. The use of an additional requirement (the presence of variants in both forward and reverse reads) allowed us to exclude all three false positive SNVs.
All six FP indel variants were located in or close to homopolymeric regions, particularly in the poly(A) tracts. The most likely cause of these FP variants was random errors introduced by the non-proofreading AmpliTaq polymerase (Life Technologies). The library preparation stage of our workflow contained two rounds of PCR; therefore, the probability of random errors was rather high. For example, a 2080delA deletion in the 61,455 position of BRCA1 (exon 11) was determined in 15% of the reads. Sanger sequencing of a corresponding DNA fragment showed the presence of the 2080delA allele if the amplicons were generated using AmpliTaq Gold polymerase, while the reference allele was only present when Pfu-polymerase was used. Thus, it is essential to use the proofreading enzyme in preparing the MPS library. Interestingly, some of FP deletions are present in the BIC database, particularly the 2080delA in BRCA1 and 1806delA deletion in BRCA2. It is possible that some of the BIC database indels were caused by nonrandom sequencing errors. All SNVs and indel variants confirmed by Sanger sequencing are shown in Table 3.

Variant annotation
All variants were categorized by gene location (intronic or exonic) and by their predicted effects (frameshift, synonymous substitution, missense substitution, splice site alteration or nonsense). All variants were annotated for their frequency on the basis of their presence in dbSNP135 or their frequency in the 1000 Genomes Project. We defined a variant as "rare" if it was not found in dbSNP135 or if its allele frequency was <0.001 in the 1000 Genomes Project (Abecasis et al., 2010). To evaluate the clinical significance of these variants, we compared all variants to the gene-specific mutation database (BIC, http://research. nhgri.nih.gov/bic/). In addition, we predicted the effects of rare missense SNVs on protein function using the SIFT (Kumar et al., 2009) and PROVEAN algorithms. We defined a variant as "functional" if both SIFT and PROVEAN predicted it to be damaging.
As a result, we identified 46 unique and true variants using MPS. Of these, 17 variants were defined as rare. Three of the rare variants (p.2418_2418del in BRCA2; p.115_115del and p.Q5385X in BRCA1) were diseasecausing, and the remaining variants consisted of eight splice site alterations (1 c.80+118->AT insertion and seven single-base substitutions), seven missense substitutions and three synonymous substitutions. The role of the 11 remaining rare variants in the development of HBOC is unclear. We defined these variants as likely pathogenic. However, none of them were predicted to be damaging by both SIFT and PROVEAN.

Discussion
The increasing requirement of genetic testing and reduction of analysis turnaround times has led to the need to improve the sequencing technology used in our laboratory. We considered the possibility of using an MPSbased method of target sequencing. Despite a number of studies reporting the successful application of this method in determining the disease-causing mutations, existing MPS platforms are rarely used in clinical practice. In our opinion, one of the key limitations is that the method is not designed for sequencing heterogeneous clinical samples, which often have small concentrations of DNA or consist of poorly preserved samples of fragmented DNA. In the present study, we used the Illumina MiSeq benchtop platform to identify BRCA1/2 mutations in patients with HBOC. We obtained patient DNA samples from various sources and did not use any additional sample preparation methods to standardize them. Using Sanger sequencing as a reference method, we estimated the sensitivity and specificity of the MPS test and the possibility of its application in a routine clinical setting.
The complete BRCA1/2 resequencing workflow consisted of enrichment of targeted regions of interest, targets sequencing, and finally, bioinformatic analysis. Target enrichment significantly reduces the sequencing cost per base and improves the accuracy of variant detection. Currently, several of possible approaches for enrichment have been proposed. The first is the amplicon enrichment approach, which is based on using singleplex, multiplex or long-range PCR amplicons as the MPS template (Morgan et al., 2010;Leeneer et al., 2011;Hernan et al., 2012). Singleplex PCR is one of the simplest and unpretentious methods, as it does not require a large amount of starting DNA. However, it has a limited throughput. The main advantage of multiplex PCR is its higher productivity, though multiplex PCR requires a larger amount of DNA, and the amplification results can vary significantly depending on the reagents and laboratory equipment used. Both of these methods of library enrichment can give rise to nonrandom sequencing errors due to errors introduced by Taq-polymerase during the amplicon synthesis. Long-range PCR uses a mixture of proofreading and non-proofreading DNA polymerases; thus, products of up to 40 kb long can be synthesized. This method is highly productive because the amplification of the target locus is performed using few reactions, and the use of a proofreading DNA polymerase provides a low error rate. The disadvantage of this method is that most of the amplified DNA products fall in the noncoding sequences, significantly increasing the analysis cost.
The second approach is the selection of DNA fragments by hybridization with oligonucleotide probes (Albert et al., 2007;Okou et al., 2007;Gnirke et al., 2009). The disadvantages of this approach include the nonspecific hybridization of homologous sequences (pseudogenes in particular) and the highly variable efficiency with which the probes bind the target fragments (Gnirke et al., 2009). A recent whole exome study has registered wide variations in read coverage ranging from zero to several hundredfold .
The third "selector"-based approach involves a process of selector probe capture and ligase-assisted DNA circularization to bring about target enrichment (Johansson et al., 2011). Until recently, this method was not properly optimized. Selector-and hybridization-based approaches generally require 0.8-2 mkg of DNA per sample as the starting material, which may limit their applicability as diagnostic tools.
For the present study, we used an in-house singleplex PCR method of enrichment, which proved to be the most convenient for us because our laboratory has had many years of experience in analyzing BRCA using classical Sanger sequencing. Thus, we were able to use oligonucleotide primers and amplification modes initially optimized for Sanger sequencing. In addition, this method does not require a large amount of DNA; this factor is essential because, in our practice, we must often deal with a small amount of DNA obtained from a blood print on paper or from buccal epithelia.
Target enrichment is a crucial step in sequencing, and an appropriate enrichment method provides complete and uniform coverage of the targeted regions of interest. To date, there is no consensus as to which method of enrichment is preferable. Using our PCR-based approach, we observed reasonably uniform coverage (variation <4) for 74 out of 86 amplified loci. Two remaining loci were over-represented (coverage>2000) and 10 were under-represented (coverage <500), with four-to 12-fold variation in the coverage of these loci. In subsequent studies, we have achieved a more uniform coverage (variation <5) by changing the ratio of amplicons in a pool of first universal-tailed PCR products (unpublished data).
The MPS-based testing cost depends not only on the sequencing cost but also on the cost of individual sample preparation prior to sequencing. The sample preparation process was significantly simplified by eliminating the DNA quantification and normalization steps. In this paper, we have used samples with different amounts of DNA (2-50 ng) as a matrix for the first PCR, which all provided comparable amounts of PCR products (variation <5). For all of the samples, a coverage sufficient for reliable detection of mutations was obtained. However, some of the samples isolated from blood prints had highly fragmented DNA and a significantly lower average coverage (less than 300 readings).
A key stage in the implementation of all MPS platforms is a bioinformatics analysis. Despite a number of BRCA resequencing study, the generation of a reliable variant list remains the bottleneck of this type of project. To separate the TP and FP variants, the application of a filter set is needed. We used a relatively simple set of filters (coverage>20 reads, VF> 14% and presence in both forward and reverse strands). More severe VF/coverage cutoffs reduced the number of FP variants (which enhanced specificity) but increased the number of FN (which reduced sensitivity). Other studies have recommended a VF cutoff ranging from 10% (Morgan et al., 2010) to 25% (Feliubadalo et al., 2013) and a minimum coverage ranging from 10x (Morgan et al., 2010) to 50x (Walsh et al., 2010). Some studies have used additional filters, such as a quality score>30 (Leeneer et al., 2011;Feliubadalo et al., 2013). We preferred weaker filters, as the sensitivity is more important than the specificity for a diagnostic test. A large number of FP variants only increases the load on the conformational Sanger sequencing, but does not affect the results of the test. The present results indicate that the workflow we have developed allows for the pinpointing of single nucleotide variants and indels with a specificity of 94.4% and a sensitivity of 100%.
In addition to small indels and single-base substitutions, large genomic rearrangements (LGRs) have been identified in HBOC families and account for a significant proportion of HBOC cases (from approximately 1% (Thomassen et al., 2006) up to 12-25% (Montagna et al., 2003;Walsh et al., 2006) in different populations). Theoretically, the MPS method may be used to identify LGRs using the relative ratios of reads. Two recent studies have reported the accurate identification of large genomic duplications and deletions (Walsh et al., 2010;Thompson et al., 2012). However, these studies used a hybridization-based method for the target enrichment of the DNA library. We used two rounds of PCR to enrich the DNA library with BRCA1/2 sequences. Previous studies (Leeneer et al., 2011;Feliubadalo et al., 2013) have shown this method to be unreliable for LGR detection.
Relying on our previous experience, we estimated the consumables and time spent in using our MPS approach in BRCA testing. After accounting for the additional analyses (Sanger sequencing to confirm the mutations), we found that the overall cost of testing one sample was approximately two times lower using the MPS method compared with Sanger sequencing. The hands-on time and turnaround time were also noticeably reduced.
The cost of testing for one patient is calculated based on the fact that in a single run of MPS, 96 patients can be analyzed. In 2013, there were 957 women and four men identified with newly diagnosed breast cancer and 218 women with ovarian cancer in the Altai Krai Oncological Clinic. Thus, the analysis turnaround times is approximately one month (provided that testing of inherited BRCA mutations will be administered to all patients regardless of age and family history of the disease). These large time requirements in the analysis initiation phase noticeably reduce the significance of this method for use in clinical diagnostics. However, the addition of paraffin blocks DNA samples to current BRCA NGS analysis workflow for somatic mutation testing to determine the sensitivity of tumors to a new targeted drug (PARP inhibitor) could considerably improve its clinical usability. The time spent in analysis can also be reduced if, in a single run of MPS, patients with various genetic disorders are analyzed. To do this, it is necessary to develop a uniform MPS workflow for different types of genetic testing.
Conclusion: The present study confirms the possibility of using MPS for genetic testing in routine clinical practices while pointing out some limitations of this method. The method presented here demonstrated excellent sensitivity and specificity (100% and 94.4%, respectively) and could be applied to different types of clinical specimens.