DOI QR코드

DOI QR Code

De Novo Drug Design Using Self-Attention Based Variational Autoencoder

Self-Attention 기반의 변분 오토인코더를 활용한 신약 디자인

  • Received : 2021.06.23
  • Accepted : 2021.08.26
  • Published : 2022.01.31

Abstract

De novo drug design is the process of developing new drugs that can interact with biological targets such as protein receptors. Traditional process of de novo drug design consists of drug candidate discovery and drug development, but it requires a long time of more than 10 years to develop a new drug. Deep learning-based methods are being studied to shorten this period and efficiently find chemical compounds for new drug candidates. Many existing deep learning-based drug design models utilize recurrent neural networks to generate a chemical entity represented by SMILES strings, but due to the disadvantages of the recurrent networks, such as slow training speed and poor understanding of complex molecular formula rules, there is room for improvement. To overcome these shortcomings, we propose a deep learning model for SMILES string generation using variational autoencoders with self-attention mechanism. Our proposed model decreased the training time by 1/26 compared to the latest drug design model, as well as generated valid SMILES more effectively.

신약 디자인은 단백질 수용체와 같은 생물학적 표적과 상호작용할 수 있는 약물 후보물질을 식별하는 과정이다. 전통적인 신약 디자인 연구는 약물 후보 물질 탐색과 약물 개발 단계로 구성되어 있으나, 하나의 신약을 개발하기 위해서는 10년 이상의 장시간이 요구된다. 이러한 기간을 단축하고 효율적으로 신약 후보 물질을 발굴하기 위하여 심층 학습 기반의 방법들이 연구되고 있다. 많은 심층학습 기반의 모델들은 SMILES 문자열로 표현된 화합물을 재귀신경망을 통해 학습 및 생성하고 있으나, 재귀신경망은 훈련시간이 길고 복잡한 분자식의 규칙을 학습시키기 어려운 단점이 있어서 개선의 여지가 남아있다. 본 연구에서는 self-attention과 variational autoencoder를 활용하여 SMILES 문자열을 생성하는 딥러닝 모델을 제안한다. 제안된 모델은 최신 신약 디자인 모델 대비 훈련 시간을 1/26로 단축하는 것뿐만 아니라 유효한 SMILES를 더 많이 생성하는 것을 확인하였다.

Keywords

Acknowledgement

이 논문은 2021년도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원(NRF-2019R1A2C3005212, 딥러닝을 이용한 간암 표적항암제 내성기전 규명 및 이를 극복할 새로운 표적항암제 탐색)과 국토교통부의 스마트시티 혁신인재육성사업의 지원을 받아 수행된 연구임.

References

  1. S. K. Jain and A. Agrawal, "De novo drug design: An overview," Indian Journal of Pharmaceutical Sciences, Vol.66, No.6, pp.721, 2004.
  2. A. Zhavoronkov, et al., "Deep learning enables rapid identification of potent DDR1 kinase inhibitors," Nature Biotechnology, Vol.37, No.9, pp.1038-1040, 2019. https://doi.org/10.1038/s41587-019-0224-x
  3. M. Olivecrona, T. Blaschke, O. Engkvist, and H. Chen, "Molecular de-novo design through deep reinforcement learning," Journal of Cheminformatics, Vol.9, No.1, pp.1-14, 2017. https://doi.org/10.1186/s13321-016-0187-6
  4. R. Gomez-Bombarelli, et al., "Automatic chemical design using a data-driven continuous representation of molecules," ACS Central Science, Vol.4, No.2, pp.268-276, 2018. https://doi.org/10.1021/acscentsci.7b00572
  5. M. H. Segler, T. Kogej, C. Tyrchan, and M. P. Waller, "Generating focused molecule libraries for drug discovery with recurrent neural networks," ACS Central Science, Vol.4, No.1, pp.120-131, 2018. https://doi.org/10.1021/acscentsci.7b00512
  6. S. Kang and K. Cho, "Conditional molecular design with deep generative models," Journal of Chemical Information and Modeling, Vol.59, No.1, pp.43-52, 2018. https://doi.org/10.1021/acs.jcim.8b00263
  7. F. Grisoni, M. Moret, R. Lingwood, and G. Schneider, "Bidirectional molecule generation with recurrent neural networks," Journal of Chemical Information and Modeling, Vol.60, No.3, pp.1175-1183, 2020. https://doi.org/10.1021/acs.jcim.9b00943
  8. R. Martinez, "PaccMannRL: Designing anticancer drugs from transcriptomic data via reinforcement learning," arXiv preprint arXiv:1909.05114, 2019.
  9. M. Popova, O. Isayev, and A. Tropsha, "Deep reinforcement learning for de novo drug design," Science Advances, Vol.4, No.7, pp.eaap7885, 2018. https://doi.org/10.1126/sciadv.aap7885
  10. S. Mohammadi, B. O'Dowd, C. Paulitz-Erdmann, and L. Goerlitz, "Penalized variational autoencoder for molecular design," ChemRxiv. 10.26434/chemrxiv. 7977131,v2, 2021.
  11. S. R. Heller, A. McNaught, I. Pletnev, S. Stein, and D. Tchekhovskoi, "InChI, the IUPAC international chemical identifier," Journal of Cheminformatics, Vol.7, No.1, pp.1-34, 2015. https://doi.org/10.1186/1758-2946-7-S1-S1
  12. D. Weininger, "SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules," Journal of Chemical Information and Computer Sciences, Vol.28, No.1, pp.31-36, 1988. https://doi.org/10.1021/ci00057a005
  13. D. Weininger, A. Weininger, and J. L. Weininger, "SMILES. 2. Algorithm for generation of unique SMILES notation," Journal of Chemical Information and Computer Sciences, Vol.29, No.2, pp.97-101, 1989. https://doi.org/10.1021/ci00062a008
  14. D. Weininger, "SMILES. 3. DEPICT. Graphical depiction of chemical structures," Journal of Chemical Information and Computer Sciences, Vol.30, No.3, pp.237-243, 1990. https://doi.org/10.1021/ci00067a005
  15. A. Joulin and T. Mikolov, "Inferring algorithmic patterns with stack-augmented recurrent nets," arXiv preprint arXiv:1503.01007, 2015.
  16. A. Vaswani, et al, "Attention is all you need," arXiv preprint arXiv:1706.03762, 2017.
  17. J. An and S. Cho, "Variational autoencoder based anomaly detection using reconstruction probability," Special Lecture on IE, Vol.2, No.1, pp.1-18, 2015.
  18. D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling, "Semi-supervised learning with deep generative models," In Advances in Neural Information Processing Systems, pp.3581-3589, 2014.
  19. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.770-778, 2016.
  20. J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," arXiv preprint arXiv:1607.06450, 2016.
  21. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
  22. R. J. Williams and D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," Neural Computation, Vol.1, No.2, pp.270-280, 1989. https://doi.org/10.1162/neco.1989.1.2.270
  23. G. A. Bekey and K. Y. Goldberg, eds, "Neural networks in robotics," Springer Science & Business Media, Vol.202, 2012.
  24. Bento, A. Patricia, et al., "The ChEMBL bioactivity database: An update," Nucleic Acids Research, Vol.42, No.D1, pp.D1083-D1090, 2014. https://doi.org/10.1093/nar/gkt1031
  25. A. Gaulton, et al., "ChEMBL: A large-scale bioactivity database for drug discovery," Nucleic Acids Research, Vol.40, No.D1, pp.D1100-D1107, 2012. https://doi.org/10.1093/nar/gkr777
  26. G. Maggiora, M. Vogt, D. Stumpfe, and J. Bajorath, "Molecular similarity in medicinal chemistry: Miniperspective," Journal of Medicinal Chemistry, Vol.57, No.8, pp.3186-3204, 2014. https://doi.org/10.1021/jm401411z
  27. D. E., Patterson, R. D. Cramer, A. M. Ferguson, R. D. Clark, and L. E. Weinberger, "Neighborhood behavior: A useful concept for validation of 'molecular diversity' descriptors," Journal of Medicinal Chemistry, Vol.39, No.16, pp.3049-3059, 1996. https://doi.org/10.1021/jm960290n