Introduction
Steady advances in algorithms for both NOE assignments and structure calculations have made automatic calculations of protein 3D structures with raw NOE data, provided the chemical shifts are assigned for most atoms and sufficient peaks exist.1 The algorithms automate an iterative process in which the assignment of NOE peaks and structure calculations with the new restraints are coupled. Owing to the current state of computational power, several studies have reported that fully automatic structure calculations from processed NMR data were feasible without any manual interpretation.2-5 However, the improvements and refinements of NMR structures are still believed to be largely dependent on the skills and experiences of the researchers interpreting and calculating the structures.
Despite the sensitivity increments of NMR hardware and the developments of new experiments over the last decades, the number of experimentally obtainable NMR restraints for determining the coordinates of a 3D structure is still much smaller than those obtainable using X-ray crystallography. Computational aids have contributed to improving the structural qualities of NMR structures. The efforts are classified into two criteria: the use of an empirical database and the application of sophisticated calculation methods stemming mostly from molecular dynamics (MD) simulations. The MD simulation consists of two main parts, an atomistic force field and a conformational space search. Since NMR structure calculations make use of experimental restraints, force fields simpler than those used in conventional MD simulations were employed,6,7 allowing for conformation searches at higher temperatures without concern for the instability of the system during the computation. Through the advances of algorithms and computational powers, MD simulation has grown popular. All-atom force field, which enables a biomolecule to behave in a realistic way during MD simulation, has matured, helping reconcile the discrepancy between experimental and simulated data. Along with the force field, the implicit solvent model, a mostly generalized-Born model, has improved the qualities of resulting structures as well by approximating the effects that explicit solvents bring about even with reduced computational times. This results in the outcomes that are expected with the explicit solvents.8
Determination of the NMR structure with an all-atom force field and a generalized-Born model (hereafter GBIS), restrained with experimental data, has successfully improved NMR structures, particularly in the refinement stage of the calculations. Especially, the GBIS refinement is advantageous for improving the local geometries that cannot be confined due to a lack of experimental restraints. For instance, GBIS has helped in unambiguous positioning of the donors and acceptors of hydrogen bonds, allowing further insight into the pH dependence of binding affinity in the complex between UIM and ubiquitin.9 Besides improvement in the local geometry, GBIS refinement is effective in determining the global fold as well. As reported by Brooks and his colleagues, GBIS refinement could determine 3D folds with less than 10% of the original NOE data.10,11
Considering its potential, the applications of GBIS for refining NMR structures are likely to increase, particularly for the proteins whose structures are difficult to determine with conventional methods. The proteins include membrane proteins and complexes consisting of multiple proteins. Despite the advances in algorithms, however, the effects of GBIS on NMR structure calculations are not straightforward, necessitating quantitative analyses. In this paper, we calculated 3D structures of ubiquitin (UBQ) and GB1 using conventional and GBIS methods with a variety of subsets from experimental NMR distance restraints. The detailed quantitative interpretation facilitates the comparison of the results from the GBIS method with those from the conventional methods.
Experimental
Restraints for NMR Structure Calculation. Experimental restraints for calculating the structures of UBQ and GB1 were extracted from the PDB database (http://www.rcsb.org), where they are deposited as 1D3Z and 3GB1, respectively. Only the distance and backbone torsion angle restraints were employed. The numbers for the distance restraints are 1,446 and 584 for ubiquitin and GB1, respectively. The number of restraints for UBQ(GB1) are 288(122), 294(122), 236(83), and 628(257) for intra (|i-j|=0), sequential (|i-j|=1), medium (1<|i-j|<5), and long (|i-j|>4) ranges, respectively. The torsion angle restraints consist of 62 ϕ angles for ubiquitin, whereas those for ϕ and ψ in GB1 are 52 and 49. The original distance restraints were randomly omitted to prepare restraints 0.05, 0.1, 0.3, 0.5, and 0.7 times smaller. In each number, five different restraint sets were generated to obtain statistics using different random seeds.
Protocols for NMR Structure Calculations. Structure calculations by a set of restraints consist of two steps: CYANA calculation and AMBER-based refinement from the CYANA result. We first calculated 300 structures of UBQ and GB1 with experimental distance and torsion angle restraints using CYANA.6 Here 20,000 steps of torsion angle dynamics were employed. The top 100 CYANA structures that did not show significant violations against the experimental inputs were chosen for further refinements, and the top 20 structures were used for analyses. We further calculated the structures using the AMBER package (ver. 12) with GBIS options, generalized-Born implicit solvent model (igb = 5)13 and ff99SB all-atom force field.12 As a conformational search method of GBIS, we applied a restrained simulated annealing of 20 ps with PMEMD module. Here, the temperature was increased to 1,000 K for the first quarter. It stayed at 1,000 K for the second quarter, followed by a stepwise cooling to 0 K for the latter half. The force constants for distance and torsion angle restraints were 50 kcal × mol−1 × Å−2 and 200 kcal × mol−1 × rad−2, respectively. The integration time step for restrained simulated annealing was 1 fs. Of 100 structures, the best 20 structures that showed the lowest energies with no significant violation against the distance (< 0.5 Å) and torsion angle restraints (< 5°) were selected as an ensemble for further analyses. All the calculations were performed with Linux-cluster machines consisting of 120 cores.
Criteria for Quantitative Analyses of NMR Structures. For quantitative analyses, in addition to energy values obtained using AMBER, the resulting structures were compared from the viewpoints of two backbone RMSDs, one between resulting structures (eRMSD; mean root-mean-square deviation in an ensemble to mean structure) and the other between resulting structures and reference X-ray structures, 1UBQ for UBQ and 2QMT for GB1 (rRMSD; mean root-meansquare deviation in an ensemble to reference structure). We chose the ranges of 1–70 and 1–56 residues in UBQ and GB1, respectively, for the RMSDs calculation. Two highly refined structures of UBQ (1D3Z) and GB1 (3GB1), with residual dipolar coupling and coupling constants as additional experimental restraints, were also included for comparison. In addition, we calculated the most favored region in the Ramachandran plot and MolProbity packing score. The parameters were calculated by PROCHECK-NMR and MolProbity software packages,13,14 respectively.
Results and Discussion
GBIS Improved the Geometries of Side-Chain Atoms More Than the Results Refined with Extensive NMR Data. The degree of improvement in the NMR structures was revealed by eRMSD and rRMSD, which represent the precision and accuracy of an ensemble of NMR structures, respectively. The structures that were refined with the PDBdeposited distance and backbone torsion angle restraints by GBIS displayed significant improvements in qualities (Tables 1 and 2). From 0.51 and 0.46 of eRMSDCYA, eRMSDGBIS decreased to 0.28 and 0.29 for UBQ and GB1, respectively. Similarly, rRMSDGBIS improved to 0.62 and 0.59 from rRMSDCYA of 0.91 and 0.68 for UBQ and GB1, respectively. We would like to note that such improvements were achieved with only calculations and without further experiments that require additional samples and NMR measurements. The degrees of improvement were comparable to previous results, thus recommending the employment of GBIS for refining NMR structures. Nevertheless, it would be inadequate to compare the GBIS structures directly to CYANA structures, because the force field of CYANA is simplified and optimized for calculation speed and automatic procedures. It is rather noteworthy that the GBIS-refined structures displayed improved side-chain geometries even compared to high precision NMR structures, 1D3Z for UBQ and 3GB1 for GB1. The structures of 1D3Z and 3GB1 were calculated with extensive experimental restraints that include residual dipolar and J-scalar couplings, resulting in very precise and accurate backbone geometries, 0.09 and 0.18 Å for eRMSD and 0.29 and 0.58 Å for rRMSD, in UBQ and GB1, respectively. We quantified the geometric qualities of side-chains using MolProbity clash score, where the lower value indicates the better geometry. For example, 0.081 of the MolProbity score corresponds to the top 100 percentile. The GBIS refined structures of both UBQ and GB1 showed improvements in the MolProbity score. In particular, the decrease in GB1 from 1.785 (run-0) to 0.670 (run-1) was marked. It can be explained by the fact that most of the additional restraints from residual dipolar and J-scalar couplings were confined to backbone atoms, whereas the effects of GBIS are spread across side-chains. Our data indicates that there is still a need to employ GBIS refinements in highly refined structures with experimental restraints.
Table 1.aThe numbers in parentheses of distance restraints mean those for long range NOEs. bThe number of conformations in an ensemble. If not mentioned, the number is 20. c“n.a.” means “not available”. dFor Ramachan analysis, only most favored regions are considered.
Table 2.GBIS-refined GB1 structuresa aAll the values are prepared with the same rules to Table 1.
GBIS Greatly Improved both Precision and Accuracy when the Number of Restraints Decreased. GBIS improved all the structures that were calculated with subsets of distance restraints. The increments of omission in the distance restraints elevated eRMSDCYA and rRMSDCYA, indicating the difficulties in finding precise and accurate conformations by CYANA calculations. The decreases of eRMSDGBIS and rRMSDGBIS from eRMSDCYA and rRMSDCYA, respectively, were marked. However, the side-chain packing qualities from MolProbity scores were indistinguishable in most of the GBIS refined structures (Tables 1 and 2). Plots of eRMSDCYA versus (eRMSDCYA-eRMSDGBIS), and rRMSDCYA versus (rRMSDCYA-rRMSDGBIS) clearly represent the degrees and tendencies (Fig. 1). Here please note that higher values in Y-axes indicate larger improvements by GBIS. All the values were greater than zero. The apparent linearity between RMSDCYA and (RMSDCYA-RMSDGBIS means that the improvements were proportional. However, there were dissimilarities between the UBQ and GB1 cases. All the CYANA calculations of both UBQ and GB1 did not generate an ensemble of structures that had eRMSD and rRMSD values of less than 1 Å when the restraints were reduced more than 10-fold (Tables 1 and 2). GBIS could not yield precise and accurate structures of UBQ like those in GB1. Whereas there were significant improvements in all the GBIS results of UBQ, most of the rRMSDGBIS values were greater than 1 Å with 10% restraints and greater than 2 Å with 5% restraints. For example, the value of rRMSDGBIS at run-55 was 5.09, indicating that the structures of UBQ do not agree with the X-ray structure. On the other hand, GBIS did enable GB1 to form structures that had comparable qualities in both precision (< 1 Å) and accuracy (< 1 Å) to those of the reference. In particular, GB1 could reduce eRMSD from 3.37 to 0.45 and rRMSD from 10.00 to 0.56 Å at run-53. It is remarkable that GBIS enabled a wrong fold to recover into a correct structure. Nevertheless, it should be noted that when the structures were refined with only 5% restraints the number of structures that did not show significant distance violations against the input data was only 10 and 9 at run-51 and -52, respectively. In all other cases, the numbers of non-violating structures exceeded 20. The lack of numbers at run-51 and -52 might be caused by the inefficiency of the conformational search due to either insufficient restraints to guide the structures to the correct fold or an initial wrong geometry that cannot be escaped for the given sampling steps of the conformational search.
Figure 1.Plots of eRMSDCYA and (eRMSDCYA-eRMSDGBIS) (a), and rRMSDCYA and (rRMSDCYA-rRMSDGBIS) (b). Labels of Xand Y-axis are written at bottom and top of each plot, respectively. Blue squares correspond to the runs by ubiquitin, whereas red circles represent GB1. The run-55 of ubiquitin and the run-53 of GB1 are labeled with dashed circles in blue and red, respectively. Note that higher values in Y-axes indicate larger improvements by GBIS.
Ensemble RMSD of GBIS-refined Structures Could be Used as a Validation Metric. The improvements by GBIS led to the decreased rRMSD, which in turn caused a better correlation between eRMSDGBIS and rRMSDGBIS than eRMSDCYA and rRMSDCYA (Fig. 2(a), (b)). One of the main concerns in calculating NMR structures of biomolecules is the validation of the fold of the resulting structures. NOE, from which a distance restraint is prepared, originates between two atoms located proximally in space (< 6 Å). Because of the short-range feature, evaluating the contribution of each distance restraint to the global fold in a quantitative way is not straightforward. In addition, one cannot discriminate the accurate and inaccurate structures with just the violations and energies information. Please note that the data for run-55 did not show any significant violation against input restraints but revealed inaccurate results. Proper global parameters that allow for judgment of the soundness of NMR structures of biomolecules have been awaited. We found that two criteria of the eRMSDCYA and eRMSDGBIS showed apparent correlation with rRMSDCYA and rRMSDGBIS, respectively (Fig. 2(a), (b)). AMBER energy also correlated with rRMSDGBIS (Fig. 2(c)). The Pearson’s correlation factors (R) for these metrics were 0.91 for eRMSDCYA versus rRMSDCYA, 0.97 for eRMSDGBIS versus rRMSDGBIS, and 0.73 for AMBER energy versus rRMSDGBIS. It would be difficult to directly apply the AMBER energy for validation without any calibration, since the value is dependent on protein and it is not known which value is low enough a priori. The better correlation of rRMSD with eRMSDGBIS than eRMSDCYA suggests eRMSDGBIS as a metric for validation. At least it can suggest when the accuracy of the results will be suspected, i.e. when the value of eRMSDGBIS is larger than 2 Å. It is noteworthy that the representative structures reflect AMBER energies in parts, because the selection criteria of the top 20 structures are based on the AMBER energies. Further studies in this direction are necessary for general use of eRMSDGBIS as a validation metric.
Figure 2.Plots of eRMSDCYA and rRMSDCYA (a), eRMSDGBIS and rRMSDGBIS (b), ZGBIS and rRMSDGBIS. In order to fairly compare both the cases of ubiquitin and GB1, we calculated Z values by dividing the difference of individual AMBER Energy and mean of AMBER energies in a protein with standard deviation of AMBER Energy in a protein. Pearson’s correlation factors (R) were described as well.
GBIS with Additional Psi Angle Restraints Improved Precision and Accuracy of Ubiquitin Structures. To fairly compare the performances of GBIS in UBQ and GB1 refinements and know whether the inefficiencies of the GBIS protocol in UBQ run-41–55 originated from the lack of backbone torsion angle restraints, we added psi angle restraints and performed GBIS refinements with UBQ. The PDB-deposited restraints of UBQ do not contain psi angle restraints. The restraints by psi angle are known to be important in discerning secondary structures. The psi angle restraints can be easily prepared, provided the chemical shifts are assigned. We extracted the chemical shifts of UBQ from the BMRB database (BMRB ID: 6457) and generated 59 psi angle restraints by TALOS+ software15 using the database that excluded ubiquitin. We recalculated the runs with 5 and 10% distance restraints with additional torsion angle restraints (Table 3). It is noted again that the distance restraints were identical between 41–45 and 61–65 and between 51–55 and 71–75 runs. Addition of psi angle restraints improved qualities considerably in the results from both CYANA and GBIS. All the rRMSDGBIS values with 10% restraints reduced to a value lower than 1.0 Å. Except for run-75, all the other rRMSDGBISs with 5% restraints had a value lower than 2.0 Å. Nevertheless, the accuracy and precision were still less compared to those obtained with GB1. Our data support the idea that the contributions of individual distance restraints in determining NMR structures are varied and depend on the proteins.
Table 3.GBIS-refined ubiquitin structures with additional ψ angle restraints
The Efficiency of the Conformational Space Search was More Important than the Number of Conformations Generated for Calculating Accurate Structures. The next question was whether enhancement of the GBIS refinement could allow a structure trapped at an inaccurate position to escape accurately with only sparse experimental restraints. Our data showed that the GBIS was typically effective with sparse experimental restraints but not in all cases. The results at run-75 indicated that the structures were inaccurate, even though there was no significant violation of the input restraints. Because our original protocols did not work properly for the restraints of run-75, we considered two ways to improve GBIS efficiency. One was to increase the number of structures calculated and the other was to increase the length of duration in the dynamics. These two methods were applied to the recalculation of the data from run-75 with identical restraints. We first generated 200 and 500 structures (run-81 and -82) with the same protocols as the calculations in Tables 1–3. Second, we increased the duration of dynamics 2- and 5-fold, leading to 40- and 100-ps restrained simulated annealing, respectively, with 100 structures (run-83 and -84). We selected 20 structures as a final ensemble from the two case runs, as well. The results showed that the runs with longer durations generated better eRMSD and rRMSD values (Table 4). The 100-ps GBIS refinement (run-84) generated results with comparable qualities to the other 5% results from run-71–74. Visual inspection of structures from run-55, run-75, and run-81–84 represents which parts improved according to improved methods (Fig. 3). While α- helical regions that are confined by short range restraints were well refined, β-strands were not converged due to lack of long range distance restraints. The data clearly demonstrates that the efficiency of the conformational space search is more important for finding the correct fold, at least under the conditions GBIS was applied. We do not intend to argue over the optimization of GBIS protocol in this study, but there are enough possibilities that advanced sampling, including replica-exchange,16 could lead to better results than restrained simulated annealing.
Table 4.GBIS-refined ubiquitin structures with more structures or longer time-steps
Figure 3.Overlaid 20 structures of ubiquitin. Top 20 structures were overlaid with backbone atoms of residues 1-70. Each ensemble was labeled with corresponding run number. Ribbon diagram shows reference X-ray structure (PDB ID: 1UBQ).
Conclusion
A more accurate and precise structure is helpful in studying and applying the function of a protein. For example, detailed information on the protein-protein and proteinligand complex interfaces can facilitate the discovery of more effective inhibitors. However, NMR signals interpreted as structural restraints are often invisible when the system undergoes unfavorable motion on the NMR timescale, which in turn leads to inaccurate structures. Therefore, determining 3D structures with a limited number of restraints has a wide range of applications. Adding to previous data that showed the strength of GBIS for fixing local geometries,9,17-20 the role of GBIS has been extended into improving global folds. Brooks and his colleagues reported the applications of GBIS for determining accurate global folds in a series of proteins with about 10% of sub-restraints using replica-exchange.10,11 In this study, we showed that it is even possible to obtain accurate folds with 5% of subrestraints using restrained simulated annealing. Because of the differences of the datasets, a direct comparison is not straightforward, but it is clear that our data permitted detailed systematic evaluations. Many successful examples that utilize Rosetta algorithms and sparse NMR restraints in determining 3D structures have been published.21-23 Comparison or combined use with Rosetta remains an interesting topic. Our data provide a meaningful starting point for the direction.
References
- Guntert, P. Eur. Biophys. J. 2009, 38, 129. https://doi.org/10.1007/s00249-008-0367-z
- Lopez-Mendez, B.; Guntert, P. J. Am. Chem. Soc. 2006, 128, 13112. https://doi.org/10.1021/ja061136l
- Schmidt, E.; Guntert, P. J. Am. Chem. Soc. 2012, 134, 12817. https://doi.org/10.1021/ja305091n
- Ikeya, T.; Takeda, M.; Yoshida, H.; Terauchi, T.; Jee, J. G.; Kainosho, M.; Guntert, P. J. Biomol. NMR 2009, 44, 261. https://doi.org/10.1007/s10858-009-9339-6
- Ikeya, T.; Jee, J. G.; Shigemitsu, Y.; Hamatsu, J.; Mishima, M.; Ito, Y.; Kainosho, M.; Guntert, P. J. Biomol. NMR 2011, 50, 137. https://doi.org/10.1007/s10858-011-9502-8
- Guntert, P.; Mumenthaler, C.; Wuuthrich, K. J. Mol. Biol. 1997, 273, 283. https://doi.org/10.1006/jmbi.1997.1284
- Brunger, A. T.; Adams, P. D.; Clore, G. M.; DeLano, W. L.; Gros, P.; Grosse-Kunstleve, R. W.; Jiang, J. S.; Kuszewski, J.; Nilges, M.; Pannu, N. S.; Read, R. J.; Rice, L. M.; Simonson, T.; Warren, G. L. Acta Crystallogr. D Biol. Crystallogr. 1998, 54, 905.
- Xia, B.; Tsui, V.; Case, D. A.; Dyson, H. J.; Wright, P. E. J. Biomol. NMR 2002, 22, 317. https://doi.org/10.1023/A:1014929925008
- Jee, J. Bull. Kor. Chem. Soc. 2010, 31, 2717. https://doi.org/10.5012/bkcs.2010.31.9.2717
- Chen, J.; Im, W.; Brooks, C. L., III. J. Am. Chem. Soc. 2004, 126, 16038. https://doi.org/10.1021/ja047624f
- Chen, J.; Won, H. S.; Im, W.; Dyson, H. J.; Brooks, C. L., 3rd J. Biomol. NMR 2005, 31, 59. https://doi.org/10.1007/s10858-004-6056-z
- Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Proteins 2006, 65, 712. https://doi.org/10.1002/prot.21123
- Laskowski, R. A.; Rullmannn, J. A.; MacArthur, M. W.; Kaptein, R.; Thornton, J. M. J. Biomol. NMR 1996, 8, 477.
- Davis, I. W.; Leaver-Fay, A.; Chen, V. B.; Block, J. N.; Kapral, G. J.; Wang, X.; Murray, L. W.; Arendall, W. B., 3rd.; Snoeyink, J.; Richardson, J. S.; Richardson, D. C. Nucleic. Acids Res. 2007, 35, W375. https://doi.org/10.1093/nar/gkm216
- Shen, Y.; Delaglio, F.; Cornilescu, G.; Bax, A. J. Biomol. NMR 2009, 44, 213. https://doi.org/10.1007/s10858-009-9333-z
- Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 1999, 141.
- Fujiwara, K.; Tenno, T.; Sugasawa, K.; Jee, J. G.; Ohki, I.; Kojima, C.; Tochio, H.; Hiroaki, H.; Hanaoka, F.; Shirakawa, M. J. Biol. Chem. 2004, 279, 4760.
- Ohno, A.; Jee, J.; Fujiwara, K.; Tenno, T.; Goda, N.; Tochio, H.; Kobayashi, H.; Hiroaki, H.; Shirakawa, M. Structure 2005, 13, 521. https://doi.org/10.1016/j.str.2005.01.011
- Jee, J.; Ahn, H. C. Bull. Korean Chem. Soc. 2009, 30, 1139. https://doi.org/10.5012/bkcs.2009.30.5.1139
- Jee, J.; Mizuno, T.; Kamada, K.; Tochio, H.; Chiba, Y.; Yanagi, K.; Yasuda, G.; Hiroaki, H.; Hanaoka, F.; Shirakawa, M. J. Biol. Chem. 2010, 285, 15931. https://doi.org/10.1074/jbc.M109.075333
- Raman, S.; Lange, O. F.; Rossi, P.; Tyka, M.; Wang, X.; Aramini, J.; Liu, G.; Ramelot, T. A.; Eletsky, A.; Szyperski, T.; Kennedy, M. A.; Prestegard, J.; Montelione, G. T.; Baker, D. Science 2010, 327, 1014. https://doi.org/10.1126/science.1183649
- Lange, O. F.; Rossi, P.; Sgourakis, N. G.; Song, Y.; Lee, H. W.; Aramini, J. M.; Ertekin, A.; Xiao, R.; Acton, T. B.; Montelione, G. T.; Baker, D. Proc. Natl. Acad. Sci. USA 2012, 109, 10873. https://doi.org/10.1073/pnas.1203013109
- Thompson, J. M.; Sgourakis, N. G.; Liu, G.; Rossi, P.; Tang, Y.; Mills, J. L.; Szyperski, T.; Montelione, G. T.; Baker, D. Proc. Natl. Acad. Sci. USA 2012, 109, 9875. https://doi.org/10.1073/pnas.1202485109
Cited by
- Letter to Editor: Accelerating atomistic refinement of NMR structures using Graphics Processing Unit vol.18, pp.2, 2014, https://doi.org/10.6564/JKMRS.2014.18.2.069
- Comparison of NMR structures refined under implicit and explicit solvents vol.19, pp.1, 2015, https://doi.org/10.6564/JKMRS.2015.19.1.001
- Strategy for Determining the Structures of Large Biomolecules using the Torsion Angle Dynamics of CYANA vol.20, pp.4, 2016, https://doi.org/10.6564/JKMRS.2016.20.4.102