DOI QR코드

DOI QR Code

A Study on the Drug Classification Using Machine Learning Techniques

머신러닝 기법을 이용한 약물 분류 방법 연구

  • Anmol Kumar Singh (School of Computer Engineering, Kalinga Institute of Industrial Technology) ;
  • Ayush Kumar (School of Computer Engineering, Kalinga Institute of Industrial Technology) ;
  • Adya Singh (School of Computer Engineering, Kalinga Institute of Industrial Technology) ;
  • Akashika Anshum (School of Computer Engineering, Kalinga Institute of Industrial Technology) ;
  • Pradeep Kumar Mallick (Kalinga Institute of Industrial Technology, India)
  • Received : 2024.05.23
  • Accepted : 2024.06.20
  • Published : 2024.06.30

Abstract

This paper shows the system of drug classification, the goal of this is to foretell the apt drug for the patients based on their demographic and physiological traits. The dataset consists of various attributes like Age, Sex, BP (Blood Pressure), Cholesterol Level, and Na_to_K (Sodium to Potassium ratio), with the objective to determine the kind of drug being given. The models used in this paper are K-Nearest Neighbors (KNN), Logistic Regression and Random Forest. Further to fine-tune hyper parameters using 5-fold cross-validation, GridSearchCV was used and each model was trained and tested on the dataset. To assess the performance of each model both with and without hyper parameter tuning evaluation metrics like accuracy, confusion matrices, and classification reports were used and the accuracy of the models without GridSearchCV was 0.7, 0.875, 0.975 and with GridSearchCV was 0.75, 1.0, 0.975. According to GridSearchCV Logistic Regression is the most suitable model for drug classification among the three-model used followed by the K-Nearest Neighbors. Also, Na_to_K is an essential feature in predicting the outcome.

본 논문에서는 인구통계학적, 생리학적 특성을 기반으로 환자에게 가장 적합한 약물을 예측하는 것을 목표로 하는 약물 분류 시스템을 제시한다. 데이터 세트에는 적절한 약물을 결정하기 위한 목적으로 연령, 성별, 혈압(BP), 콜레스테롤 수치, 나트륨 대 칼륨 비율(Na_to_K)과 같은 속성들이 포함된다. 본 연구에 사용된 모델은 KNN(K-Nearest Neighbors), 로지스틱 회귀 분석 및 Random Forest이다. 하이퍼파라미터를 최적화하기 위해 5겹 교차 검증을 갖춘 GridSearchCV를 활용하였으며, 각 모델은 데이터 세트에서 훈련 및 테스트 되었다. 초매개변수 조정 유무에 관계없이 각 모델의 성능은 정확도, 혼동 행렬, 분류 보고서와 같은 지표를 사용하여 평가되었다. GridSearchCV를 적용하지 않은 모델의 정확도는 0.7, 0.875, 0.975인 반면, GridSearchCV를 적용한 모델의 정확도는 0.75, 1.0, 0.975로 나타났다. GridSearchCV는 로지스틱 회귀 분석을 세 가지 모델 중 약물 분류에 가장 효과적인 모델로 식별했으며, K-Nearest Neighbors가 그 뒤를 이었고 Na_to_K 비율은 결과를 예측하는 데 중요한 특징인 것으로 밝혀졌다.

Keywords

References

  1. Gala, D. V., Gandhi, V. B., Gandhi, V. A., & Sawant, V. (2021, October). Drug classification using machine learning and interpretability. In 2021 Smart Technologies, Communication and Robotics (STCR) (pp. 1-8).
  2. Mridha, K., Bappon, S. D., Sabuj, S. M., Sarker, T., & Ghosh, A. (2023, August). Explainable Machine Learning for Drug Classification. In International Conference on Electrical and Electronics Engineering (pp. 673-683). Singapore: Springer Nature Singapore. DOI : 10.1007/978-981-99-8661-3_48.
  3. Chen, C. (2024). Research on Drug Classification Using Machine Learning Model. Highlights in Science, Engineering and Technology, 81, 350-355. https://doi.org/10.54097/nfpj0845
  4. Gururaj, H. L. et al. (2021). Classification of drugs based on mechanism of action using machine learning techniques. Discover Artificial Intelligence, 1(1), 13. DOI : 10.1007/s44163-021-00012-2
  5. Saad, A. I., Omar, Y. M., & Maghraby, F. A.(2019). Predicting drug interaction with adenosine receptors using machine learning and SMOTE techniques. IEEE Access, 7, 146953-146963. DOI : 10.1109/ACCESS.2019.2946314
  6. Shobana, G., & Bushra, S. N. (2020, December). Drug administration route classification using machine learning models. In 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS) (pp. 654-659). IEEE. DOI : 10.1109/ICISS49785.2020.9315975
  7. Lee, S., Kim, S., Lee, J., Kim, J. Y., Song, M. H., & Lee, S. (2023). Explainable Artificial Intelligence for Patient Safety: A Review of Application in Pharmacovigilance. IEEE Access. DOI : 10.1109/ACCESS.2023.3271635
  8. Ponzoni, I., Paez Prosper, J. A., & Campillo, N. E. (2023). Explainable artificial intelligence: A taxonomy and guidelines for its application to drug discovery. Wiley Interdisciplinary Reviews: Computational Molecular Science, 13(6), e1681. DOI : 10.1002/wcms.1681
  9. Puneeth, G. R. et al. (2021). Analysis of drug classification using mechanism of action. J Phys Conf Ser., 1914(1), 01204. 10.1088/1742-6596/1914/1/012034.
  10. Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition, 40(7), 2038-2048. DOI : 10.1016/j.patcog.2006.12.019.