DOI QR코드

DOI QR Code

Data-centric XAI-driven Data Imputation of Molecular Structure and QSAR Model for Toxicity Prediction of 3D Printing Chemicals

3D 프린팅 소재 화학물질의 독성 예측을 위한 Data-centric XAI 기반 분자 구조 Data Imputation과 QSAR 모델 개발

  • ChanHyeok Jeong (Integrated Engineering, Department of Environmental Science and Engineering College of Engineering, Kyung Hee University) ;
  • SangYoun Kim (Integrated Engineering, Department of Environmental Science and Engineering College of Engineering, Kyung Hee University) ;
  • SungKu Heo (Integrated Engineering, Department of Environmental Science and Engineering College of Engineering, Kyung Hee University) ;
  • Shahzeb Tariq (Integrated Engineering, Department of Environmental Science and Engineering College of Engineering, Kyung Hee University) ;
  • MinHyeok Shin (Integrated Engineering, Department of Environmental Science and Engineering College of Engineering, Kyung Hee University) ;
  • ChangKyoo Yoo (Integrated Engineering, Department of Environmental Science and Engineering College of Engineering, Kyung Hee University)
  • 정찬혁 (경희대학교 공과대학 환경응용과학과 융합공학전공) ;
  • 김상윤 (경희대학교 공과대학 환경응용과학과 융합공학전공) ;
  • 허성구 (경희대학교 공과대학 환경응용과학과 융합공학전공) ;
  • ;
  • 신민혁 (경희대학교 공과대학 환경응용과학과 융합공학전공) ;
  • 유창규 (경희대학교 공과대학 환경응용과학과 융합공학전공)
  • Received : 2023.05.23
  • Accepted : 2023.09.01
  • Published : 2023.11.01

Abstract

As accessibility to 3D printers increases, there is a growing frequency of exposure to chemicals associated with 3D printing. However, research on the toxicity and harmfulness of chemicals generated by 3D printing is insufficient, and the performance of toxicity prediction using in silico techniques is limited due to missing molecular structure data. In this study, quantitative structure-activity relationship (QSAR) model based on data-centric AI approach was developed to predict the toxicity of new 3D printing materials by imputing missing values in molecular descriptors. First, MissForest algorithm was utilized to impute missing values in molecular descriptors of hazardous 3D printing materials. Then, based on four different machine learning models (decision tree, random forest, XGBoost, SVM), a machine learning (ML)-based QSAR model was developed to predict the bioconcentration factor (Log BCF), octanol-air partition coefficient (Log Koa), and partition coefficient (Log P). Furthermore, the reliability of the data-centric QSAR model was validated through the Tree-SHAP (SHapley Additive exPlanations) method, which is one of explainable artificial intelligence (XAI) techniques. The proposed imputation method based on the MissForest enlarged approximately 2.5 times more molecular structure data compared to the existing data. Based on the imputed dataset of molecular descriptor, the developed data-centric QSAR model achieved approximately 73%, 76% and 92% of prediction performance for Log BCF, Log Koa, and Log P, respectively. Lastly, Tree-SHAP analysis demonstrated that the data-centric-based QSAR model achieved high prediction performance for toxicity information by identifying key molecular descriptors highly correlated with toxicity indices. Therefore, the proposed QSAR model based on the data-centric XAI approach can be extended to predict the toxicity of potential pollutants in emerging printing chemicals, chemical process, semiconductor or display process.

3D 프린터의 활용이 높아짐에 따라 발생하는 화학물질에 대한 노출 빈도가 증가하고 있다. 그러나 3D 프린팅 발생 화학물질의 독성 및 유해성에 대한 연구는 미비하며, 분자 구조 데이터의 결측치로 인해 in silico 기법을 사용한 독성예측 연구는 저조한 실정이다. 본 연구에서는 화학물질의 분자구조 정보를 나타내는 주요 분자표현자의 결측치를 보간하여 3D 프린팅의 독성 및 유해성을 예측한 Data-centric QSAR 모델을 개발하였다. 먼저 MissForest 알고리즘을 사용해 3D 프린팅으로 발생되는 유해물질의 분자표현자 결측치를 보완하였으며, 서로 다른 4가지 기계학습 모델(결정트리, 랜덤포레스트, XGBoost, SVM)을 기반으로 Data-centric QSAR 모델을 개발하여 생물 농축 계수(Log BCF)와 옥탄올-공기분배계수(Log Koa), 분배계수(Log P)를 예측하였다. 또한, 설명 가능한 인공지능(XAI) 방법론 중 TreeSHAP (SHapley Additive exPlanations) 기법을 활용하여 Data-centric QSAR 모델의 신뢰성을 입증하였다. MissForest 알고리즘 기반 결측지 보간 기법은, 기존 분자구조 데이터에 비하여 약 2.5배 많은 분자구조 데이터를 확보할 수 있었다. 이를 바탕으로 개발된 Data-centric QSAR 모델의 성능은 Log BCF, Log Koa와 Log P를 각각 73%, 76%, 92% 의 예측 성능으로 예측할 수 있었다. 마지막으로 Tree-SHAP 분석결과 개발된 Data-centric QSAR 모델은 각 독성치와 물리적으로 상관성이 높은 분자표현자를 통하여 선택함을 설명할 수 있었고 독성 정보에 대한 높은 예측 성능을 확보할 수 있었다. 본 연구에서 개발한 방법론은 다른 프린팅 소재나 화학공정, 그리고 반도체/디스플레이 공정에서 발생 가능한 오염물질의 독성 및 인체 위해성 평가에 활용될 수 있을 것으로 사료된다.

Keywords

Acknowledgement

본 논문은 연구재단 4단계 BK21 사업과 과학기술정보통신부 재원으로 한국연구재단의 지원을 받아 수행된 연구로 이에 감사를 드립니다(No. 2021R1A2C2007838).

References

  1. Kwon, K. M., Kim, H. G. and Moon, S. H., "The 4th Basic Plan for Material and Parts Development," Ministry of Trade, Industry and Energy 16-28(2016).
  2. Jang, Y. J. and Jeong, E. M.,, "Global Trends in the 4th Industrial Revolution and Strategies for the Response of Korean Industries," Korea Institute for Industrial Economics & Trade 22-24(2017).
  3. Park, S. H., "A Study on R&D Policy through 3D Printing Industry Trend Analysis," Science and Technology Policy 24(3), 93-104(2014).
  4. An, K. C., "Trends and Implications of 3D Printing Industry in the 4th Industrial Revolution," Institute for Information & Communications Technology Promotion 5-8(2018).
  5. Zhou, Y., Kong, X., Chen, A. and Cao, S., "Investigation of Ultrafine Particle Emissions of Desktop 3D Printers in the Clean Room," Procedia Eng., 121, 506-512(2015). https://doi.org/10.1016/j.proeng.2015.08.1099
  6. HUBS, Additive Manufacturing Trend Report 2021 (2021).
  7. Stabile, L., Scungio, M., Buonanno, G., Arpino, F. and Ficco, G., "Airborne Particle Emission of a Commercial 3D Printer: the Effect of Filament Material and Printing Temperature," Indoor Air, 27(2), 398-408(2017).
  8. Kim, Y. N., Yoon, C. S., Ham, S. H., Park, J. H., Kim, S. H., Kwon, O. H. and Tsai, P. J., "Emissions of Nanoparticles and Gaseous Material from 3D Printer Operation," Environ Sci Technol., 49(20), 12044-12053(2015). https://doi.org/10.1021/acs.est.5b02805
  9. Azimi, P., Zhao, D., Pouzet, C., Crain, N. E. and Stephens B., "Emissions of Ultrafine Particles and Volatile Organic Compounds from Commercially Available Desktop Three-Dimensional Printers with Multiple Filaments," Environ Sci Technol., 50(3), 1260-1268(2016). https://doi.org/10.1021/acs.est.5b04983
  10. Steinle, P., "Characterization of Emissions from a Desktop 3D Printer and Indoor Air Measurements in Office Settings," J. Occup. Environ. Hyg., 13(2), 121-132(2016). https://doi.org/10.1080/15459624.2015.1091957
  11. Kim G. H., Lyu K. G., Kim Y. J. and Kim H. C., "A Survey on Quantitative Structure-Activity Relationship(QSAR) Models," 2008 Proceedings of the Korean Information Science Society Conference, July, Pyungchang 35(1), 43-44(2008).
  12. Ding Y. L., Lyu Y. C. and Leong M. K., "In Silico Prediction of the Mutagenicity of Nitroaromatic Compounds Using a Novel Two-QSAR Approach," Toxicology in Vitro 40(1), 102-114(2017). https://doi.org/10.1016/j.tiv.2016.12.013
  13. Kobayashi, Y. and Yoshida, K., "Development of QSAR Models for Prediction of Fish Bioconcentration Factors Using Physicochemical Properties and Molecular Descriptors with Machine Learning Algorithms," Ecol Inform 63(1), 2-9(2021). https://doi.org/10.1016/j.ecoinf.2021.101285
  14. Pandit, S., Singh, P., Sinha, M. and Parthasarathi, R., "Integrated QSAR and Adverse Outcome Pathway Analysis of Chemicals Released on 3D Printing Using Acrylonitrile Butadiene Styrene," Chem Res Toxicol 34(2), 355-364(2021). https://doi.org/10.1021/acs.chemrestox.0c00274
  15. Kim D. W., Lee S. C., Kim M. J., Lee E. J. and Yoo C. K., "Development of QSAR Model Based on the Key Molecular Descriptors Selection and Computational Toxicology for Prediction of Toxicity of PCBs," Korean Chemical Engineering Research 54(5), 621-629(2016). https://doi.org/10.9713/kcer.2016.54.5.621
  16. To, K. T., Fry, R. C. and Reif, D. M., "Characterizing the Effects of Missing Data and Evaluating Imputation Methods for Chemical Prioritization Applications Using ToxPi," BioData Min 11(1), (2018).
  17. Lee J. G., Shin G. J., Park C. Y. and Hwang U. J., "Robust, fair and scalable data-driven continuous learning," Communications of the Korean Institute of Information Scientists and Engineers 40(11), 53-58(2022).
  18. Yang, F., Du, J., Lang, J., Lu, W., Liu, L., Jin, C. and Kang, Q., "Missing Value Estimation Methods Research for Arrhythmia Classification Using the Modified Kernel Difference-Weighted KNN Algorithms," Biomed Res Int 2020(1), 1-9(2020). https://doi.org/10.1155/2020/7141725
  19. Luo, Y., "Evaluating the State of the Art in Missing Data Imputation for Clinical Data," Brief Bioinform 23(1), 1-9(2022). https://doi.org/10.1093/bib/bbab489
  20. Carli, M., Ward, M. H., Metayer, C. and Wheeler, D. C., "Imputation of Below Detection Limit Missing Data in Chemical Mixture Analysis with Bayesian Group Index Regression," Int. J. Environ Res. Public Health 19(3), 2-14(2022). https://doi.org/10.3390/ijerph19031369
  21. Jeong, J. S., Garcia-Reyero, N., Burgoon, L., Perkins, E., Park, T. H., Kim, C. H., Roh, J. Y. and Choi, J. H., "Development of Adverse Outcome Pathway for PPARγ Antagonism Leading to Pulmonary Fibrosis and Chemical Selection for Its Validation: ToxCast Database and a Deep Learning Artificial Neural Network Model-Based Approach," Chem Res Toxicol 32(6), 1212-1222(2019). https://doi.org/10.1021/acs.chemrestox.9b00040
  22. Tiganis, B. E., Burn, L. S., Davis, P. and Hill, A. J., "Thermal Degradation of Acrylonitrile-butadiene-styrene (ABS) Blends," Polym Degrad Stab 76(1), 425-434(2002). https://doi.org/10.1016/S0141-3910(02)00045-9
  23. Rutkowski, J. V. and Levin, B. C., "Acrylonitrile-Butadiene-Styrene Copolymers (ABS): Pyrolysis and Combustion Products and their Toxicity-A Review of the Literature," Fire Mater 10(1), 93-105(1986). https://doi.org/10.1002/fam.810100303
  24. Wojtyla, S., Klama, P. and Baran, T., "Is 3D Printing Safe? Analysis of the Thermal Treatment of Thermoplastics: ABS, PLA, PET, and Nylon," J. Occup. Environ. Hyg., 14(6), 80-85(2017).
  25. Davis, A. Y., Zhang, Q., Wong, J. P. S., Weber, R. J. and Black, M. S., "Characterization of Volatile Organic Compound Emissions from Consumer Level Material Extrusion 3D Printers," Build Environ 160, 106209(2019).
  26. Pandit, S., Singh, P., Sinha, M. and Parthasarathi, R., "Integrated QSAR and Adverse Outcome Pathway Analysis of Chemicals Released on 3D Printing Using Acrylonitrile Butadiene Styrene," Chem Res Toxicol 34(2), 355-364(2021). https://doi.org/10.1021/acs.chemrestox.0c00274
  27. Park, J. H., Jeon, H. J., Oh, Y. S., Park, K. H. and Yoon, C. S., "Understanding Three-dimensional Printing Technology, Evaluation, and Control of Hazardous Exposure Agents," Journal of Korean Society of Occupational and Environmental Hygiene 28(3), 241-256(2018).
  28. Kim, S. H., Chung, E. K., Kim, S. D. and Kwon, J. W., "Assessment of Emitted Volatile Organic Compounds, Metals and Characteristic of Particle in Commercial 3D Printing Service Workplace," Original Article Journal of Korean Society of Occupational and Environmental Hygiene 30(2), 153-162(2020).
  29. Kim, S. H. and Chung, E. K., "A Study on the Types of Materials and Hazardous Substances Used in 3D Printers," Korea Occupational Safety and Health Agency 10-13(2019).
  30. Hong, M. K., Jo, J. H., Choi, B. K. and Kim, K. W., A Study on the Application of OECD Toolbox in Chemical Information (2018).
  31. Mauri, A., Srl, A., Consonni, V., Pavan, M. and Todeschini, R., DRAGON SOFTWARE: AN EASY APPROACH TO MOLECULAR DESCRIPTOR CALCULATIONS (n.d.).
  32. Rahimi, R., Keshavarz, M. H., and Akbarzadeh, A. R., "Prediction of the Density of Energetic Materials on the Basis of their Molecular Structures," Central European Journal of Energetic Materials 13(1), 73-101(2016). https://doi.org/10.22211/cejem/64965
  33. Consonni, V., Todeschini, R. and Pavan, M., "Structure/response Correlations and Similarity/diversity Analysis by GETAWAY Descriptors. 1. Theory of the Novel 3D Molecular Descriptors," J. Chem. Inf. Comput. Sci., 42(3), 682-692(2002). https://doi.org/10.1021/ci015504a
  34. Devinyak, O., Havrylyuk, D. and Lesyk, R., "3D-MoRSE Descriptors Explained," J. Mol. Graph. Model., 54, 194-203(2014). https://doi.org/10.1016/j.jmgm.2014.10.006
  35. Stekhoven, D. J. and Buhlmann, P., "Missforest-Non-parametric Missing Value Imputation for Mixed-type Data," Bioinformatics 28(1), 112-118(2012). https://doi.org/10.1093/bioinformatics/btr597
  36. Marinov, D. and Karapetyan, D., "Hyperparameter Optimisation with Early Termination of Poor Performers," 2019 11th Computer Science and Electronic Engineering (CEEC), Colchester, UK 160-163(2019).
  37. Choi, G. C., Kim, W. J. and Koo, J. M., "Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants," Biotechnology and Bioprocess Engineering 28(1), 143-151(2023). https://doi.org/10.1007/s12257-022-0330-3
  38. Moon, J. H., Park, S. W., Rho, S. M. and Hwang, E. J., "Interpretable Short-Term Electrical Load Forecasting Scheme Using Cubist," Comput Intell Neurosci 2022(1), 2-19(2022). https://doi.org/10.1155/2022/6892995
  39. Lundberg, S. M., Allen, P. G. and Lee, S. I., "A Unified Approach to Interpreting Model Predictions," 31st Conference on Neural Information Processing Systems (NIPS 2017), December, California 1-10(2017).