Technologies to generate replenishable sources of target-specific antibodies have revolutionized biomedical research and the diagnosis and treatment of diseases. Hybridoma technology (1) is a highly effective and well-established method to generate murine monoclonal antibodies, and is still widely used to produce antibodies for a variety of applications, including therapeutic antibodies. More recently, in vitro methods have been developed to generate target-specific antibodies. Most notably, the development of in vitro display technologies (2) such as phage display, has enabled rapid isolation of target-specific antibodies from large antibody libraries. Advantages of the in vitro display methods include the speed and ease of antibody generation, the ability to control various selection parameters, and the ability to generate fully human antibodies for therapeutic development. Consequently, high-affinity, high-specificity antibodies suitable for demanding applications can readily be produced and engineered by these technologies, and phage display is now a major technical platform to generate therapeutic antibody candidates.
The success of in vitro antibody generation depends largely upon the quality and the size of the antibody library. In case of phage and yeast display libraries (the two most widely used methods), the size of a library is simply determined by the efficiency of host cell transformation. On the other hand, a lot of different factors can influence the quality of a library. This is especially true for the synthetic antibody libraries (Table 1); unlike natural antibody libraries (3-7) which are constructed by PCR-amplification of V(D)J-rearranged immunoglobulin genes from B-cell cDNA and thus do not require human input during the generation of sequence diversity, the strategy for the sequence diversification is necessary, and indeed critical, for the construction of synthetic antibody libraries. Most of the existing synthetic antibody libraries have their sequence diversity concentrated in the complementarity determining regions (CDRs), and generated by random combinations of mono- or trinucleotide units (8-16). The CDRs of a synthetic antibody library need to be designed so that the resulting library is enriched with diverse, yet nature-like sequences, which are stable and highly expressed in host cells. Well-designed synthetic antibody libraries have several distinct advantages, which include high levels of expression, good solubility and stability, and the ease of engineering and optimization. In this review, design philosophies and diversification strategies of previously reported synthetic antibody libraries will be compared and discussed in detail, with emphasis on phage display libraries which account for the majority of known large antibody libraries.
Table 1.Summary of some of the synthetic antibody libraries discussed in this review
STRUCTURAL ORGANIZATION OF ANTIBODY LIBRARIES
Most of the antibody libraries for in vitro display are either in the Fab or scFv format. Fab is a ~50 kDa fragment of the whole immunoglobulin that consists of Fd chain (VH -CH1) and light chain (VL -CL) disulfide bonded to each other at the C-terminus, while scFv is a ∼25 kDa fragment in which VH and VL are connected by a flexible linker sequence. Fab is generally more stable, and its binding activity is better retained when converted to a whole immunoglobulin, compared to scFv. On the other hand, the level of expression from E. coli is on average lower than scFv, and sequencing analysis requires two reactions per clone compared to scFv which generally needs only one reaction. A well-known problem of scFv is that when reformatted to IgG, some of the clones may lose the binding activity (17). ScFv is also prone to multimerization (see below) (18) which improves the apparent affinity through avidity effect, but needs to be avoided for applications that require monomeric interaction. For the scFv libraries, the domain orientation can either be VH -linker-VL (9-11, 19) or VL -linker-VH (20), although there can be considerable difference in expression level and binding activity between the two orientations (21-23). The linker sequence is generally ∼15 amino acids long and rich in glycines and polar amino acids (3, 7, 10, 11), however shorter linker sequences can be used, which facilitates the multimerization of scFv (24). To crudely summarize, Fab has more consistent and reliable binding activity, whereas scFv has better manipulability. Taking these factors into account, the format of the library needs to be decided when designing an antibody library.
Human immunoglobulin gene loci contain a number of variable, diversity (for the heavy chain), and joining genes for heavy and light (kappa and lambda) chains. As a result, serum immunoglobulin and natural antibody libraries derived from the same source, are complex mixtures of antibody clones that differ from one another in their biochemical and biophysical properties. In natural antibody libraries, this results in uneven propagation and preferential enrichment of fast-growing clones, and biases the panning output toward highly expressed clones rather than high affinity clones. Also, many of the selected binder clones have frameworks that are poorly expressed in E. coli, thus impeding the early steps of the antibody screening process. Prudent selection of the framework sequences for the synthetic antibody library can help to significantly reduce many of these problems, although at the expense of the framework diversity. Human germline immunoglobulin variable segments such as DP47 and DPK22 are frequently employed as templates for the synthetic antibody library construction (10, 11), based on their favorable characteristics such as high stability, high level of expression in various hosts, compatibility with one another in forming stable VH -VL interface, and favored usage in human antibody response (25).
Many synthetic antibody libraries use a single framework sequence; it makes library design and construction simple, and the clones are more uniform in their properties. It may be argued that a single framework cannot accommodate diverse paratope conformations required for the recognition of various epitopes with different shapes, but there are many reports that large synthetic antibody libraries with a single or very limited number of framework sequences, are capable of generating antibodies against diverse antigens and epitopes (10, 11, 26). There are a few examples of elaborately designed synthetic antibody libraries with multiple variable heavy and light chain framework regions, such as the HuCAL libraries (12-14) and Ylanthia library (15). HuCAL and HuCAL GOLD consist of the consensus sequences of seven VH , four Vκ and three Vλ germline families, making a total of 49 VH-VL framework combinations. HuCAL PLATINUM has seven VH, three Vκ and three Vλ framework sequences, in which the rarely utilized VH4 and Vκ4 sequences were excluded. One potential advantage of the libraries with multiple framework sequences is that they can accommodate different CDR canonical structures (27, 28) which are known to be dependent on both CDR and framework sequences, and might influence the antibody binding properties to different types of antigens (29). The framework design of Ylanthia library focuses on improving the physicochemical properties and the developability, and 36 fixed pairs of VH-VL were selected based on properties such as their natural prevalence, canonical CDR structures, post-translational modification motifs, isoelectric point, expression yields, aggregation propensity, melting temperature (Tm), and serum stability.
DIVERSIFICATION OF CDRs
The simplest method for CDR diversification is the synthesis of random sequences using nucleotide mixtures. All 20 amino acids plus the amber stop codon (TAG) are encoded by NNK or NNS degenerate codon (N is any of the four deoxyribonucleotides; K = G or T; S = G or C), and other combinations of nucleotide mixtures can produce codons encoding different sets of amino acids. Examples of commonly used degenerate codons in CDR design include KMT (M = A or C) that encodes Ala, Asp, Ser or Tyr; WMC (W = A or T) for Asn, Ser, Thr, or Tyr; and RRT (R = A or G) for Asn, Asp, Gly, or Ser. These degenerate codons are relatively easy to design and cost-efficient, and highly functional antibody libraries have been constructed using this method (8, 10, 11). The major limitation of this approach is the low precision of the sequence diversification. Unlike the trinucleotide phosphoramidites method or Slonomics technology (see below) that can incorporate any combination of codons at the desired position, the random degenerate codon method allows only those codons that are in the same row or column of the codon table, or the combination of such codons. For example, tyrosine (TAY, Y = T or C) and glycine (GGN) cannot be encoded by a same degenerate codon without also including cysteine (TGY) and aspartate (GAY). This poses a problem when one tries to design an antibody library that emulates natural CDR sequences, because many “unnatural” CDR sequences are also synthesized and incorporated into the library. However, it should be noted that this problem does not preclude the library from producing high-affinity, target-specific antibodies to most antigens.
The trinucleotide phosphoramidite method, or trinucleotide-directed mutagenesis (TRIM), utilizes a set of pre-synthesized trinucleotide codon units for the synthesis of diversified CDRs (12, 14). One can incorporate only the desired amino acids in a desired ratio at any position, by using mixtures of these units in the oligonucleotide synthesis. Consequently, CDRs can be designed to have more nature-like distribution of amino acids, thus making the antibodies better resemble the natural antibodies. HuCAL has TRIM-based diversity in CDR-H3 and CDR-L3, with the amino acid frequency roughly reflecting that found in natural CDRs. HuCAL GOLD has a more sophisticated design with all six CDRs diversified. CDRs with different canonical structures were designed for CDR1 and CDR2 of the light and heavy chains, so that the resulting library covers most of the canonical structure combinations as well as the amino acid compositions found in natural antibodies. HuCAL PLATINUM further improved the library design, incorporating the length-dependent amino acid usage in CDR-H3 and the codon optimization for higher expression level. The TRIM strategy is a significant technological improvement over the simple degenerate codon synthesis, yet the method still relies on the random combination of the trinucleotide units, which inevitably introduces some unnatural CDR sequences. Also, both random oligonucleotide synthesis and TRIM suffer from synthesis errors that result in mutations, frameshifts, and/or unintended length variations in CDR (11, 13), resulting in lowering of the functional diversity of the libraries. A novel solid-phase gene synthesis technology (Slonomics® technology) (30) has been utilized for the generation of CDR diversity with defined amino acid composition and frequency, and higher proportion of functional clones. In one example, a synthetic library with more nature-like CDR sequences was designed and constructed (31). The library has four heavy chain and two kappa light chain frameworks, and the first and second CDRs of the heavy and light chains were based on the germline CDR sequences for the respective framework, with mutations that reflected the somatic hypermutation frequencies found in natural human antibody repertoire at each CDR position. The third CDRs of the heavy and light chains of the library were synthesized to mimic the amino acid frequency at each position of the CDR-H3 and CDR-L3 of the natural human antibody repertoire. As a result, this library has more nature-like sequences, which conceivably have better functionality, stability and folding behavior than libraries with a lot of unnatural sequences (32). In another example utilizing the technology for antibody library construction, CDR3 of the heavy and the light chains were synthesized to reflect the length variation and the amino acid usage of the corresponding regions of naturally occurring human antibodies, where the usage of certain PTM-prone amino acids such as Asn, Asp, and Met was decreased or lacking (15).
The synthetic antibody libraries differ not only in CDR sequence diversification strategy, but also the CDRs that were diversified. Many high-quality libraries have all six CDRs diversified (11, 13, 14, 26, 31), achieving maximal conformational diversity needed for the efficient isolation of antibodies against various epitopes of different types of antigens. On the other hand, there are many synthetic antibody libraries with only a smaller number of CDRs diversified, in most cases CDR-H3 and CDR-L3 (8, 10, 12). These libraries have a simpler design and are relatively easy to construct, the natural germline sequences of the invariable CDRs are likely to contribute to the better folding and stability at the expense of lower conformational and sequence diversity, and they have been successfully utilized in the generation of high-affinity, target-specific antibody molecules. One thing to consider regarding the number of the diversified CDRs is the functionality of the library. The synthesis of oligonucleotides for the diversified CDRs is inherently vulnerable to synthetic errors, often introducing mutations, insertions, and deletions of nucleotides that can result in nonfunctional antibody clones. The proportion of nonfunctional clones increases geometrically with the increasing number of the diversified CDRs; therefore, with six diversified CDRs a significant proportion of the clones in the library can be unproductive (11,13). Incidentally, this is also true for the antibody library with a very long CDR (33), which could lead to the accumulation of errors during the synthesis. A proofreading step, such as β-lactamase genetic fusion selection, can be added during the construction of the synthetic antibody library in order to increase the proportion of the productive clones (11,13).
While CDR-H3 and CDR-L3 are centrally located in the antigen binding site, have a far greater diversity (especially CDR-H3) than other CDRs due to the V(D)J recombination, and play disproportionately large roles in antigen binding (34), there are examples of synthetic antibody libraries with minimal or no diversity in these regions. Minimalist CDR libraries with tetranomial (35) or even binomial (36,37) diversity have been shown to yield specific antibodies against multiple targets, suggesting that a restricted set of amino acids has a sufficient chemical repertoire for the formation of diverse antigen-combining surfaces. Target binders could even be isolated from a library with no CDR-H3 diversity (38). There is also a report indicating that CDR-L3 diversification negatively affects the library fitness (39). Taken together, these examples imply that while the total sequence diversity of the antigen combining site is important for the functionality of the antibody library, different viable methods of diversification and distributing the diversity among the CDRs, may exist.
OPTIMIZING MOLECULAR PROPERTIES
An antibody library is a mixture of billions of different antibody molecules, each of which has different physical, chemical, and biological properties. For an antibody molecule to be manufactured, purified, and stored in large a quantity for commercial purpose, molecular properties such as the level of expression, stability, and solubility need to be optimal (40), and the antibody optimization can be time- and cost-intensive if the parental molecule has poor molecular properties. Consequently, it is desirable to generate initial hit antibodies that have the desirable physicochemical and biological properties. The molecular properties of the antibodies from natural sources - e.g. monoclonal antibodies from hybridoma or natural antibody library - are unpredictable, and many have unfavorable properties that could hamper the downstream development processes. On the other hand, a deliberate design for improved molecular properties can be employed in the construction of synthetic antibody libraries. For example, the expression level of the antibody can be improved by codon optimization (41), and the undesirable N-glycosylation motifs can be minimized or removed from the antibody repertoire (14, 15). In one highly sophisticated example (15), several hundreds of human heavy chain - light chain pairs in Fab and IgG formats - were tested experimentally for their biochemical and biophysical properties such as expression level, stability, isoelectric point, and aggregation propensity. A few dozen HC-LC pairs with highly desirable properties were further optimized by removing or minimizing potential post-translational modification motifs, and these were used as templates for Fab library construction. Many therapeutic antibody lead molecules with excellent functional activity fail to proceed to the downstream development phase because of poor molecular properties (CMC liabilities; (42)) Thus, the synthetic antibody library enables a “quality by design” approach (43) that may facilitate the antibody development process.
The in vitro display technologies enabled rapid generation of target-binding antibodies with exquisite specificity. Large repertoires of antibody fragments can be rapidly interrogated for clones with desired binding characteristics. Especially, the synthetic antibody libraries tailored for better developability are valuable resources of high-quality antibodies suitable for commercial development. Enhanced knowledge about the sequence, structure, function, and physicochemical and biological characteristics of antibodies has allowed researchers to design and construct highly functional, sophisticated synthetic antibody libraries that produce antibodies that rival or surpass the antibodies of natural origin, in their binding affinity and specificity, chemical and physical stability, solubility, and the level of expression. Since the demand for high-quality antibodies as clinical, diagnostic, and research agents continues to increase, synthetic antibody libraries with novel, improved designs will become an indispensable tool for the generation of such antibodies.