Novel DOT1L ReceptorNatural Inhibitors Involved in Mixed Lineage Leukemia: a Virtual Screening, Molecular Docking and Dynamics Simulation Study

Background: The human protein methyl-transferase DOT1L catalyzes the methylation of histone H3 on lysine 79 (H3K79) at homeobox genes and is also involved in a number of significant processes ranging from gene expression to DNA-damage response and cell cycle progression. Inhibition of DOT1L activity by shRNA or small-molecule inhibitors has been established to prevent proliferation of various MLL-rearranged leukemia cells in vitro , establishing DOT1L an attractive therapeutic target for mixed lineage leukemia (MLL). Most of the drugs currently in use for the MLL treatment are reported to have low efficacy, hence this study focused on various natural compounds which exhibit minimal toxic effects and high efficacy for the target receptor. Materials and Methods: Structures of human protein methyl-transferase DOT1L and natural compound databases were downloaded from various sources. Virtual screening, molecular docking, dynamics simulation and drug likeness studies were performed for those natural compounds to evaluate and analyze their anti-cancer activity. Results: The top five screened compounds possessing good binding affinity were identified as potential high affinity inhibitors against DOT1L’s active site. The top ranking molecule amongst the screened ligands had a Glide g-score of -10.940 kcal/mol and Glide e-model score of -86.011 with 5 hydrogen bonds and 12 hydrophobic contacts. This ligand’s behaviour also showed consistency during the simulation of protein-ligand complex for 20000 ps, which is indicative of its stability in the receptor pocket. Conclusions: The ligand obtained out of this screening study can be considered as a potential inhibitor for DOT1L and further can be treated as a lead for the drug designing pipeline.


Introduction
The Mixed-Lineage Leukemia (MLL) is an extremely fatal blood cancer that predominantly happens in pediatric patients. The modifications of the MLL gene which are present in approx. 75% of infants and about 10% of adults (Daser et al., 2005;Krivtsov et al., 2007;Liedtke et al., 2009) leads to MLL. Regardless of the phenotypic distinction, the gene expression profiles for MLLrearranged leukemias of both acute lymphocytic leukemia and acute myeloid leukemia lineage overlaps (Ross et al., 2004). This sort of intense leukemia has an especially poor prognosis (Chen et al., 1993;Felix et al., 1995), with 5-year occasion free survival with MLL-rearranged acute lymphocytic leukemia being just ~40%. Hence, there occurs a pressing need to discover new medication The human protein methyl-transferase DOT1L has been discovered to be a focus for leukemia with MLL translocations. ENL, AF4, AF9 and AF10 are the four most basic MLL fusion partners which were reportedly Bioinformatics, Information Technology, Indian Institute of Information Technology, Allahabad, India *For correspondence:pritish@ iiita.ac.in

Novel DOT1L ReceptorNatural Inhibitors Involved in Mixed
Lineage Leukemia: a Virtual Screening, Molecular Docking and Dynamics Simulation Study Utkarsh Raj, Himansu Kumar, Saurabh Gupta, Pritish Kumar Varadwaj* have the capacity to bind the DOT1L protein as a feature of the transcription protein complexes (Okada et al., 2005;Zhang et al., 2006;Mueller et al., 2007;Bitoun et al., 2007;Krivtsov et al., 2008;Mueller et al., 2009). These fusion partners as directed by the AT hook domain of the MLL-rearranged gene, were proposed to structure a large complex which binds to the promoter regions of certain MLL directed genes viz. Hoxa9 (Bitoun et al., 2007). Methylation of H3K4 doesn't occur if the C-terminal of these MLL targeted fusion partners don't possess a SET domain. Rather, the recruited DOT1L methylates H3K79 that can also initiate and maintain transcriptions of the gene as shown in Figure 1. This unusual epigenetic event leads to the dysregulation of the several MLL targeted genes viz. Meis1, Hoxa7 and Hoxa9, which if overexpressed are known to cause leukemia (Okada et al., 2005).
Since DOT1L is the main H3k79 methyltransferase whose contribution in MLL leukemogenesis is clear. Moreover, the transformation of normal progenitor cells into leukemic stem cells by an artificial fused MLL-DOT1L gene was also reported. However, the failure occurs in the case when MLL fused with a mutant DOT1L which didn't possess methyltransferase activity. This further resulted in establishing the role of the activity of methyltransferase in causing this type of leukemia (Okada et al., 2005). In addition, a high level of the activity of the methyltransferase DOT1L is required for the transformation and maintenance of MLL leukemia. It was also found that the transforming ability of MLL-AF4 and MLL-AF10 got nullified due to the 50%-70% declination in the expression of DOT1L caused by the RNA interference (Okada et al., 2005;Krivtsov et al., 2008). An alternate fascinating application focusing on DOT1L is its role in the programming of cells. Recent studies on the inhibition of DOT1L by utilizing small molecule inhibitor as well as RNA interference have also been reported to significantly improve the reprogramming efficacy to develop induced pluripotent stem cells (iPSC) from somatic cells (Onder et al., 2012). These outcomes have validated the key role of DOT1L in leukemogenesis of MLL-rearranged leukemia and thus signifies a target for therapeutic intervention (Mc Lean et al., 2014).
It has been realized that numerous illnesses including tumor are produced by abnormal epigenetic alterations. By taking into account the vital role of DOT1L or H3K79 methylation in the regulation of gene, it is expected that more infections or therapeutic applications may be associated with aberrant behavior of DOT1L. On the other hand, with the exception of MLL-rearranged leukemia, there is not any convincing proof indicating potential therapeutic applications of focusing on DOT1L.The recent demonstration of DOT1L enzymatic activity as a driver of MLL-rearranged leukemia (Bernt et al., 2011;Daigle et al., 2011) provides a convincing basis for the development of effective, selective DOT1L inhibitors as therapeutic agents for the cure of MLL-rearranged leukemia patients. Till now, that standard inhibitors reported for histone methyltranferases mainly DOT1L receptor are CHAETOCIN (Papovic et al., 2012), DZNep (Helin et al., 2013), ENTACAPONE (Forsberg et al., 2003), EPZ-004777 , EPZ-5676 (Daigle et al., 2013), EPZ-6438 (Knutson et al., 2013), EPZ-005687 (Tummino et al., 2014), GSK-126 (Verma, 2015), OMEGA-3 FATTYACID (Dimri et al., 2010), P35 (Cacabelos, 2014), TANSHINDIOLS (Woo et al., 2014), CURCUMIN (Lu et al., 2013), EPZ000004 (Basavapathruni et al., 2012), EPZ003696 (Basavapathruni et al., 2012), 5-IODOTUBERCIDIN 004777 , FED1 and FED2 004777 , SAH and SAM 004777 . But these current treatment options are of limited effectiveness; hence, the development of novel small molecule inhibitors of DOT1L is necessary. Not only can these compounds be further developed to become clinically useful therapeutics for MLL leukemia (Chen, 2014), they may also serve as powerful chemical probes to investigate the biology of DOT1L in cancer and other diseases .
In this study, we employed exhaustive virtual screening and molecular docking studies to report natural inhibitors for DOT1L which more potent than the existing reference inhibitors (Raj et al., 2015). The critical residues in DOT1L methyltransferase were identified and validated by docking and simulation results of the screened inhibitor in order to elucidate their roles in causing this chronic disease. Further, the Absorption, Distribution, Metabolism, Excretion (ADME) and Toxicity properties were calculated in order to check the potency of these compounds. These results provides a clear demonstration of the potential for potent, selective small molecule inhibitors of DOT1L to affect selective killing of MLLrearranged leukemias.

Ligand selection and preparation
Different databases of compounds were downloaded for the structure based drug design namely: 1) Natural product library with 400 compounds [http://www.specs.net/ snpage.php? spageid=home], 3) NDL (Natural derivatives library) database, Flavonoids library and extended flavonoids library of 3040, 500 and 4000 compounds respectively [http://www.timtec.net] (Radha M et al., 2014), 4) NATx (Semi-Synthetic screening compounds) library, MACROx (Next Generation Macrocycles) library, MEGx_new (Natural products) and FRGx (Fragments from nature) library of 1307, 52, 652 and 59 compounds respectively [http://www.ac.discovery.com], 5) Diversity Set IV library and Natural Set II library of 1596 and 120 compounds respectively [http://www.dtp.nci.nih.gov/ repositories.html], 7) IBS Natural compounds library, Natural compounds scaffold library, bioactive compounds library of 55000, 503 and 761 compounds respectively [http://www.ibscreen.com], 8) Ten Specialized databases containing 31805 natural products and metabolites available at zinc [http://zinc.docking.org/browse/catalogs/ natural-products], 9) 20 Well-known DOT1L inhibitors. All the ligand structures in the libraries were in 2D SDF format which were first converted to 3D for docking. Geometry minimizations were performed on all the ligands using the OPLS 2005 force fields and Truncated Newton Conjugate gradient (TNCG). The ligands were first formed by ligprep with Epik to expand protonation and tautomeric states at 7.0+2.0 Ph units (Shaw et al., 2005;Kresten et al., 2010). For each ligand five low energy stereoisomers were generated and a single low-energy 3D structure with correct chiralities was retained. Qikprop was used for the screening of the ligands and further to calculate the absorption, distribution, metabolism, and excretion (ADME) properties (Kresten et al., 2010). Both, physicochemically significant descriptors and pharmacokinetically relevant properties were predicted for all the ligands from the compound libraries. Lipinski's rule of five was one of the factors used to assure the drug like (oral) pharmacokinetic profile of the ligands.

Active site identification of DOT1L
The catalytic site of human protein methyl-transferase DOT1L was identified by using SiteMap, version 3.0, Schrodinger, LLC, New York, NY, 2014 (Halgren et al., 2009). Active sites with best site scores (top ranked potential receptor binding cavity) were taken as a prerequisite for receptor grid generation. The active sites identified by this program have also been validated with the active site reported in literature available for this receptor. After preparation of protein and identification of active site, receptor grid was generated for the protein by Grid Generation panel of Glide module of version 6.2, Schrodinger, LLC, New York, NY, 2014. Grid dimension for DOT1L protein was as follows: Inner box: X=10, Y=10, Z=12, Outer box: X=22.618795, Y=22.618795, Z=22.618795, Grid center X= 18.042143, Y=62.62294, Z= -9.56231.

In silico screening and molecular docking
The virtual screening and docking analysis of Human methyl-transferase DOT1L with 99795 (excluding 20 reference inhibitors) compounds was carried out by GLIDE v6.2, Schrodinger, LLC, New York, NY, 2014. Virtual screening is a technique by which small molecules are identified from the large dataset and which are most likely to bind the targeted protein (Rajamani et al., 2007). Generated dataset was screened at the active site of SH2 domains of both proteins. Screening process of Glide includes three steps High Throughput Virtual Screening (HTVS), Standard Precision (SP) and Extra precision (XP). Through this rigorous docking based in-silico filtering, we obtained some compounds with significant glide and docking score. Some of the well-known inhibitors like CHAETOCIN, DZNep, ENTACAPONE, EPZ-004777, EPZ-5676 etc., were collected as reference molecules. Molecular docking was also performed with these molecules for the same receptor grid for comparative analysis by using GLIDE v6.2, Schrodinger, LLC, New York, NY, 2014.

Molecular dynamics simulation for the top ranked complex
Desmond was used to carry out molecular dynamics (MD) simulation on the complex structure (DOT1L receptor with the top ranked ligand), which utilizes a definite "neutral territory" midpoint method to make efficient utilization of a high degree of computational parallelism (Bowers et al., 2006). The Optimized Potentials for Liquid Simulations (OPLS) 2005 forcefield was used to model the amino acid interactions in the modeled protein, and the SPC (simple point charge) method was used for the water model. Default parameters and protocol of Desmond were used to equilibrate the system which consists of steps of restrained minimizations and simulations which are used for relaxing the system with no significant deviation from the initial coordinates of the modeled protein. The initial coordinates for the molecular dynamics simulation were taken from the modeled protein. The SPC water molecules were then added (the orthorhombic dimensions of each water box were 9Å x 11Å x 12 Å approximately, which ensured that the whole surfaces of the complexes were covered). 11 Na+ counter ions were added to the system for neutralization in order to balance the net charge of the system. Before equilibration and the long production MD simulations, the systems were minimized for 5000 iterations on a convergence threshold of 1kcal/mol/Å and pre-equilibrated by means of the inbuilt relaxation protocol implemented in Desmond. The whole system was subjected to 300 K for 20000 ps at NPT ensemble of MD simulation with a recording interval of 100 ps for total energy. The structural changes and dynamic behaviour of the modeled protein were analyzed by calculating the root mean square deviation (RMSD) and potential energy.

Calculation of ADME and Toxicity
QikProp, v3.9, Schrodinger, LLC, New York, NY, 2014 was used for calculation of ADME properties of the top five inhibitors such as Molecular Weight (MW), hydrophobicity, hydrophilicity, solvent accessible surface area, molecular volume, number of rotatable bond, donorhydrogen bonds, acceptor hydrogen bonds etc.
OSIRIS property explorer was used for toxicity prediction of top five inhibitors of DOT1L protein (http://www.organic-chemistry.org/prog/peo/). OSIRIS property explorer is an online toxicity prediction tool for the analysis of drug relevant property like Mutagenic, Tumorigenic, Irritant, Reproductive effective, cLogP, Solubility, Molecular Weight etc. All the five screened top ranked inhibitors were also selected for the analysis of druglikeness (predicted bioactivity) properties via

Structure of the target protein
Human methyl-transferase (DOT1L) has been exploited as a main therapeutic target for Mixed-Lineage Leukemia.

Analysis of the catalytic site of the DOT1L
Human methyl-transferase (DOT1L) catalytic site predictions were analyzed using SiteMap, version 3.0, Schrodinger, LLC, New York, NY, 2014. SiteMap has detected the largest volume of 527.8 Å3 in the top most active site of ABL protein on the basis of site score. The active sites identified by this program have been in accordance with the active site reported in literature available for this receptor.

In-silico screening and molecular docking analysis
The complete set of 99795 compounds (excluding 20 reference inhibitors) were subjected to three levels of virtual screening HTVS, SP and XP which resulted in 11525 compounds, 1212 compounds and 125 compounds respectively. Initially the compounds with high molecular weight (MW >500) and with reactive functional groups were filtered out. The final compounds were then docked into the active site of DOT1L protein, using earlier mentioned three stages of screening and docking. The twenty reported inhibitors were docked on the active site of DOT1L receptor directly utilizing XP-docking technique.
The Glide score was selected as the scoring function and the E-model (Kcal/mol) was used to rank the poses of the ligands as the E-model combines the Glide score, non-bonded interaction energy and excess internal energy of generated ligand conformation for flexible docking. Similarly, docking results for the reported inhibitors are also reported in Table 1 for the comparative purpose.
The docking studies indicated that the top ranked compounds showed strong hydrogen bonding interactions with the receptor. On the basis of docking score, glide g-score and glide e-model value, compound id's named C1-C5, are more effective against all the human methyltransferase (DOT1L) receptor as compared to the reported inhibitors as depicted in Table 1. These top five screened compounds could be useful for identification and development of new preventive and therapeutic drug against Mixed-Lineage Leukemia.
Top ranked compound C1 forms two H-bonds with the ASN 122 of the receptor (One with the backbone and other with the side chain of the amino acid residue) as shown in Figure 2. C1 also forms two hydrogen bond interactions with the side chain of GLN 168 and one backbone H-bond with the ASP 122. There is also one Π-Π stacking interaction with the PHE 223 amino acid residue. Polar residues ASN 127, SER 132, SER 140, SER 164 and ASN 241 also plays an important role in binding of the C1. C1 also forms two '-vely' charged interactions with GLU 129 and GLU 186 and one '+vely' charged interaction with LYS 187 amino acid residue as indicated in Table 2. Since the screened ligand binds to the same active site residues as reported , hence it can be safely assumed that C1 also inhibits the methylation of H3K79, which in-turn inhibits MLL-fusion protein mediated transcription.

Molecular dynamics simulation results
Molecular dynamics (MD) simulation was used to explore dynamic perturbations in the conformation of the top ranked complex structure. Here, we have shown the detailed analysis of MD simulation results of the top ranked compound (C1) with DOT1L receptor (GLIDE g-score=-10.940 and GLIDE e-model=-86.011). The initial energy of the complex was -146668kcal/mol. During the course of 20 ns simulation the potential energy shows a decrease and after 10 ns, it was found that the graph shows not much significant differences which is indicative of stabilization of the system as shown in Figure 3.
The conformations obtained during the 20000 ps simulation for the top ranked complex were analyzed using the simulation interaction diagram option of the Desmond program. Root Mean Square Deviation (RMSD) is used to measure the average change in displacement of a selection of atoms for a particular frame with respect to a reference frame. It is calculated for all frames in the trajectory. RMSD was calculated for the protein C-alpha and side chain atoms during the course of 20000 ps MD simulation and the ligand RMSD plot was also fitted on the protein RMSD to check the overall stability of the system as shown in Figure 4.
It is clear from the RMSD graph of the top ranked DOT1L-C1 complex that there was a slight change in conformation by 1.5 Å at very beginning of simulation, after which stability was attained, and the range of backbone atoms varied from 1.000 to 3.963 A throughout the MD simulation as shown in Figure 4. The RMSD values of the C-alpha and side chain atoms in the system  are likely to converge after 15000 ps, showing deviation of around 0.5 Å but later on it reaches a plateau with a decrease in deviation to 0.292Å. In the similar manner, the ligand fit protein graph also showed similar behaviour during the simulation process.
Likewise Root Mean Square Fluctuation (RMSF) for the protein was also calculated to decipher the perturbation in the amino acid residues of the protein. Root Mean Square Fluctuation (RMSF) is useful for characterizing local changes along the protein chain in order to elucidate which residues are more important in the activation of the receptor. For most residues, RMSF values were within the range of 2 Å and few residues exceeded the 3 Å value. Generally the tails (N-and C-terminal) fluctuate more than any other part of the protein. Secondary structure elements like alpha helices and beta strands are usually more rigid than the unstructured part of the protein, and thus fluctuate less than the loop regions. In this case also, the same result was obtained which is indicative of the stability of the system. The amino acid residues viz. ASN 126, GLU 129, ASP 161, ASP 222 and PHE 223 play a key role in the regulation of DOT1L receptor as more than  Five properties were analyzed to explain the stability of the top ranked ligand in the DOT1L receptor during the simulation of 20 ns as shown in Figure 5: (1) Ligand RMSD -Root mean square deviation of a ligand with respect to the reference conformation (typicallythe first frame is used as the reference and it is regared as time t=0); (2) Radius of Gyration (rGyr) -It is used for measuring the 'extendedness' of a ligand, and is equivalent to its principal   From the graph, it is evident that Ligand RMSD remains constant during the simulation process. The overall RMSD of the ligand C1 was below 2 Å. The ligand C1 in the initial stage shows fluctuation from 0 ps to 1,300 ps but later maintains constant RMSD during the whole simulation. This analysis indicates clearly that both compounds maintained constant RMSD during the simulation. Initially there was a fluctuation in ROG for the ligand C1 until 12 ns and then a stable conformation was acquired throughout the simulation. The radius of gyration throughout the 20 ns simulation for the ligand C1 ranged from 5.0 Å to 5.6 Å. MolSA, SASA and PSA plots also indicated the consistency of the ligand mainly after 11-12 ns during the simulation process.

ADMET predictions
ADMET properties are considered to be one of the most important criteria in order to determine true potential of the compound (Puzyn et al., 2010). About 50% of drugs are found to be unsuccessful due to the toxicity of compound during the clinical trials. Therefore, in order to save the expenses at the clinical trial stage the in silico toxicity prediction tools are used. Physically critical descriptors and pharmaceutically significant properties of these five lead compounds were investigated utilizing Qikprop program as depicted in Table 3. All the five screened compounds were in the acceptable range of Lipinski's rule of five. For these five screened lead compounds, the partition coefficient (QP logP (o/w)) and the water solubility (QP log S), which are essential when measuring the absorption and distribution of drugs within the body, ranged between -0.041 to 3.096 and -0.384 to -5.591, respectively, while the cell permeability (QP PCaco), a crucial factor governing drug metabolism and its access to biological membranes, ranged from 21.008 to 136.735. Overall, the percentage human oral absorptions for these five compounds ranged from 50% to 75%.
Toxicity Risk Assessment pannel of the Osiris Property Explorer (OPE) was used to predict toxic proerty of selected natural inhibitors as summarized in Table 3. OPE use to predict toxic property by analyzing available molecular toxic property in the Registry of Toxic Effects of Chemical Substances (RTECS) database. RTECS, a Accelrys database which is a repository of chemical structures and their toxic property.
By analyzing the results of toxicity predictor, all five screened compounds were non mutagenic, nontumorigenic, nonirritant and non-reproductive. The predicted bioactivities of these screened compounds for acting as GPCR ligand Ion, Channel Modulator, Kinase Inhibitor, Nuclear Receptor Ligand, Protease Inhibitor and Enzyme Inhibitor calculated through MOLINSPIRATION server were also depicted in Table 3. As a general thumb rule, larger is the bioactivity score, higher is the probability that investigated compound will be active. Therefore, a molecule having bioactivity score more than 0.00 is most likely to possess considerable biological activities, while values -0.50 to 0.00 are expected to be moderately active and if score is less than -0.50 it is presumed to be inactive (Verma et al., 2012).The results of the current study revealed that the screened natural compounds are biologically active molecules and will produce the physiological actions by multiple mechanisms after interacting with GPCR ligands, nuclear receptor ligands, and inhibit protease and other enzymes. Though bioactivity score for Enzyme inhibition is found to be >0.00 for all tested compounds except compound C4, but the highest score (0.48) was observed for compound C5 closely followed by compound C3 (0.40) and compound C1 (0.30).

Discussion
In conclusion, in our study we have employed virtual screening, molecular docking and dynamics simulation studies in order to find the natural inhibitors for DOT1L receptor which can be more potent in curing MLL. It was observed that the top five screened natural compounds shows minimum binding energy and better docking score with the human methyl-transferase (DOT1L) receptor. The molecular dynamics study of the best ligand-receptor complex (Compound C1 with DOT1L) also exhibit the stability of the compound during the 20 ns simulation process which validate the virtual screening and docking results. Interactions with the amino acid residues ASN 126, GLU 129, ASP 161, ASP 222 and PHE 223 remains consistent during the MD simulation process. The screened ligand acts apparently by inhibiting methylation of H3K79 thereby preventing the aberrant RNA poly2 mediated transcriptional elongation. For further validation of these five natural screened compounds, predicted bioactivities, ADME properties and their toxicity results indicates their drug likeliness. Thus, we can conclude that these five natural compounds especially compound C1 ((3R,5R)-5-(acetyloxy)-1,7-bis(3,4-dihydroxyphenyl) heptan-3-yl acetate) can be used either as potential lead to synthesize new drugs for DOT1L receptor involved in MLL or could be introduced into clinical trials as a fire new drug for the treatment of mixed lineage leukemia.