DOI QR코드

DOI QR Code

A Method of Machine Learning-based Defective Health Functional Food Detection System for Efficient Inspection of Imported Food

효율적 수입식품 검사를 위한 머신러닝 기반 부적합 건강기능식품 탐지 방법

  • Lee, Kyoungsu (Department of Big Data Analytics, Kyung Hee University) ;
  • Bak, Yerin (Department of Big Data Analytics, Kyung Hee University) ;
  • Shin, Yoonjong (School of Management, Kyung Hee University) ;
  • Sohn, Kwonsang (School of Management, Kyung Hee University) ;
  • Kwon, Ohbyung (School of Management, Kyung Hee University)
  • 이경수 (경희대학교 빅데이터응용학과) ;
  • 박예린 (경희대학교 빅데이터응용학과) ;
  • 신윤종 (경희대학교 경영학과) ;
  • 손권상 (경희대학교 경영학과) ;
  • 권오병 (경희대학교 경영학과)
  • Received : 2022.08.22
  • Accepted : 2022.09.15
  • Published : 2022.09.30

Abstract

As interest in health functional foods has increased since COVID-19, the importance of imported food safety inspections is growing. However, in contrast to the annual increase in imports of health functional foods, the budget and manpower required for inspections for import and export are reaching their limit. Hence, the purpose of this study is to propose a machine learning model that efficiently detects unsuitable food suitable for the characteristics of data possessed by government offices on imported food. First, the components of food import/export inspections data that affect the judgment of nonconformity were examined and derived variables were newly created. Second, in order to select features for the machine learning, class imbalance and nonlinearity were considered when performing exploratory analysis on imported food-related data. Third, we try to compare the performance and interpretability of each model by applying various machine learning techniques. In particular, the ensemble model was the best, and it was confirmed that the derived variables and models proposed in this study can be helpful to the system used in import/export inspections.

코로나19 이후 건강기능식품의 관심이 높아짐에 따라 수입 식품 안전성 검사의 중요성도 더욱 커지고 있다. 그러나 매년 증가하는 건강기능식품 수입량과 반대로 식품 검사에 필요한 예산과 인력은 한계점에 다다르고 있다. 따라서 본 연구의 목적은 수출입 식품 중 건강기능식품을 대상으로 데이터의 특성을 살펴보고, 판별의 정확성과 결과의 설명 가능성을 고려하여 효율적으로 부적합 식품을 탐지할 수 있는 기계학습 모델 기반 자동화 시스템 설계 방안을 제시하는 것이다. 이를 위해 첫째, 부적합 판정에 영향을 미치는 식품 검사 데이터로부터 부적합 판정에 유의한 파생변수를 생성하며, 둘째, 건강기능식품 수출입 검사 데이터에 대한 탐색적 분석을 통해 클래스 불균형과 비선형성 등을 고려하여 영향변수를 선정하며, 셋째, 다양한 머신러닝 기법을 적용하여 모델 별 성능과 해석가능성에 대해 비교를 수행하고자 한다. 성능 분석 결과, 앙상블 모델이 가장 우수하였으며, 본 연구에서 제안하는 파생변수 및 모델이 수출입 식품 검사에서 활용하고 있는 시스템에 도움이 될 수 있음을 확인하였다.

Keywords

Acknowledgement

This research was supported by a grant (21163MFDS516) from Ministry of Food and Drug Safety in 2022.

References

  1. 김은미, & 홍태호. (2015). 불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측. 지능정보연구, 21(1), 29-45. https://doi.org/10.13088/JIIS.2015.21.1.29
  2. 장동식, 이상호. (2016). 미국의 수입식품안전관리시스템 분석-가공식품을 중심으로. 국제상학, 31(4), 325-350.
  3. 조상구, 조승용. (2020). 기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구. 한국빅데이터학회지, 5(2), 53-67.
  4. 조상구, 최경현. (2018). 수입식품 빅데이터를 이용한 부적합식품 탐지 시스템에 관한 연구. 한국빅데이터학회지, 3(2), 19-33.
  5. Abouelenien, M., Yuan, X., Giritharan, B., Liu, J., & Tang, S. (2013). Cluster-based sampling and ensemble for bleeding detection in capsule endoscopy videos. American Journal of Science and Engineering, 2(1), 24-32.
  6. Ahmed, M., Mahmood, A. N., & Islam, M. R. (2016). A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems, 55, 278-288. https://doi.org/10.1016/j.future.2015.01.001
  7. Ahsan, M. M., Mahmud, M. P., Saha, P. K., Gupta, K. D., & Siddique, Z. (2021). Effect of data scaling methods on machine learning algorithms and model performance. Technologies, 9(3), 52. https://doi.org/10.3390/technologies9030052
  8. Bach, M., Werner, A., Zywiec, J., & Pluskiewicz, W. (2017). The study of under-and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis. Information Sciences, 384, 174-190. https://doi.org/10.1016/j.ins.2016.09.038
  9. Burez, J., & Van den Poel, D. (2009). Handling class imbalance in customer churn prediction Expert Systems with Applications, 36(3), 4626-4636. https://doi.org/10.1016/j.eswa.2008.05.027
  10. Cawley, G. C., & Talbot, N. L. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research, 11, 2079-2107.
  11. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. https://doi.org/10.1613/jair.953
  12. Chomboon, K., Kerdprasop, K., & Kerdprasop, N. (2013). Rare class discovery techniques for highly imbalance data. In Proc. International multi conference of engineers and computer scientists (Vol. 1).
  13. Cieslak, D. A., & Chawla, N. V. (2008, September). Learning decision trees for unbalanced data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 241-256). Springer, Berlin, Heidelberg.
  14. Cui, B., & He, S. (2016, July). Anomaly detection model based on hadoop platform and weka interface. In 2016 10th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS) (pp. 84-89). IEEE.
  15. Durica, M., & Svabova, L. (2015). Improvement of company marketing strategy based on Google search results analysis. Procedia Economics and Finance, 26, 454-460. https://doi.org/10.1016/S2212-5671(15)00873-4
  16. Eltanbouly, S., Bashendy, M., AlNaimi, N., Chkirbene, Z., & Erbad, A. (2020, February). Machine learning techniques for network anomaly detection: A survey. In 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT) (pp. 156-162). IEEE.
  17. Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering, 2(4), 42-47.
  18. GFSI - what we do, https://mygfsi.com/what-we-do/harmonisation/, 2022.
  19. Guo, H., & Viktor, H. L. (2004). Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. ACM Sigkdd Explorations Newsletter, 6(1), 30-39. https://doi.org/10.1145/1007730.1007736
  20. Hancock, J., & Khoshgoftaar, T. M. (2020, August). Medicare fraud detection using catboost. In 2020 IEEE 21st international conference on information reuse and integration for data science (IRI) (pp. 97-103). IEEE.
  21. Jeong, H., Jang, Y., Bowman, P. J., & Masoud, N. (2018). Classification of motor vehicle crash injury severity: A hybrid approach for imbalanced data. Accident Analysis & Prevention, 120, 250-261. https://doi.org/10.1016/j.aap.2018.08.025
  22. Jin, C., Bouzembrak, Y., Zhou, J., Liang, Q., Van Den Bulk, L. M., Gavai, A., ... & Marvin, H. J. (2020). Big Data in food safety-A review. Current Opinion in Food Science, 36, 24-32. https://doi.org/10.1016/j.cofs.2020.11.006
  23. Kamei, Y., Monden, A., Matsumoto, S., Kakimoto, T., & Matsumoto, K. I. (2007, September). The effects of over and under sampling on fault-prone module detection. In First international symposium on empirical software engineering and measurement (ESEM 2007) (pp. 196-204). IEEE.
  24. Kang, S., & Shin, K. S. (2021). Conditional generative adversarial network based collaborative filtering recommendation system. Journal of Intelligence and Information Systems, 27(3), 157-173. https://doi.org/10.13088/JIIS.2021.27.3.157
  25. Kaur, H., Pannu, H. S., & Malhi, A. K. (2019). A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys (CSUR), 52(4), 1-36.
  26. Kim, J., Kim, M. Y., & Kwon, O. (2020). The effect of meta-features of multiclass datasets on the performance of classification algorithms. Journal of Intelligence and Information Systems, 26(1), 23-45. https://doi.org/10.13088/JIIS.2020.26.1.023
  27. Kleboth, J. A., Kosorus, H., Rechberger, T., & Luning, P. A. (2022). Using data mining as a tool for anomaly detection in food safety audit data. Food Control, 138, 109004. https://doi.org/10.1016/j.foodcont.2022.109004
  28. Liu, J., Gao, Y., & Hu, F. (2021). A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Computers & Security, 106, 102289. https://doi.org/10.1016/j.cose.2021.102289
  29. Marvin, H. J., Bouzembrak, Y., Janssen, E. M., van der Fels-Klerx, H. V., van Asselt, E. D., & Kleter, G. A. (2016). A holistic approach to food safety risks: Food fraud as an example. Food research international, 89, 463-470. https://doi.org/10.1016/j.foodres.2016.08.028
  30. Marvin, H. J., Janssen, E. M., Bouzembrak, Y., Hendriksen, P. J., & Staats, M. (2017). Big data in food safety: An overview. Critical reviews in food science and nutrition, 57(11), 2286-2295. https://doi.org/10.1080/10408398.2016.1257481
  31. Nassif, A. B., Talib, M. A., Nasir, Q., & Dakalbab, F. M. (2021). Machine learning for anomaly detection: A systematic review. Ieee Access, 9, 78658-78700. https://doi.org/10.1109/ACCESS.2021.3083060
  32. Nguyen, H. M., Cooper, E. W., & Kamei, K. (2012, November). A comparative study on sampling techniques for handling class imbalance in streaming data. In The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems (pp. 1762-1767). IEEE.
  33. Niculescu-Mizil, A., & Caruana, R. (2005, August). Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning (pp. 625-632).
  34. Ntekouli, M., Spanakis, G., Waldorp, L., & Roefs, A. (2022, April). Using Explainable Boosting Machine to Compare Idiographic and Nomothetic Approaches for Ecological Momentary Assessment Data. In International Symposium on Intelligent Data Analysis (pp. 199-211). Springer, Cham.
  35. Omar, S., Ngadi, A., & Jebur, H. H. (2013). Machine learning techniques for anomaly detection: an overview. International Journal of Computer Applications, 79(2).
  36. Pachauri, G., & Sharma, S. (2015). Anomaly detection in medical wireless sensor networks using machine learning algorithms. Procedia Computer Science, 70, 325-333. https://doi.org/10.1016/j.procs.2015.10.026
  37. Pargent, F., Pfisterer, F., Thomas, J., & Bischl, B. (2022). Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics, 1-22.
  38. Sharif, A., Abbasi, Q. H., Arshad, K., Ansari, S., Ali, M. Z., Kaur, J., ... & Imran, M. A. (2021). Machine learning enabled food contamination detection using RFID and internet of things system. Journal of Sensor and Actuator Networks, 10(4), 63. https://doi.org/10.3390/jsan10040063
  39. Singh, A., & Purohit, A. (2015). A survey on methods for solving data imbalance problem for classification. International Journal of Computer Applications, 127(15), 37-41. https://doi.org/10.5120/ijca2015906677
  40. Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big Data, 7(1), 1-47. https://doi.org/10.1186/s40537-019-0278-0
  41. Tsamardinos, I., & Aliferis, C. F. (2003, January). Towards principled feature selection: Relevancy, filters and wrappers. In International Workshop on Artificial Intelligence and Statistics (pp. 300-307). PMLR.
  42. Wu, L., Liu, Z., Bera, T., Ding, H., Langley, D. A., Jenkins-Barnes, A., ... & Xu, J. (2019). A deep learning model to recognize food contaminating beetle species based on elytra fragments. Computers and Electronics in Agriculture, 166, 105002. https://doi.org/10.1016/j.compag.2019.105002
  43. Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In Proceedings of the first international conference on advanced data and information engineering (DaEng-2013) (pp. 13-22). Springer, Singapore.
  44. Zhang, Y. P., Zhang, L. N., & Wang, Y. C. (2010, September). Cluster-based majority under-sampling approaches for class imbalance learning. In 2010 2nd IEEE International Conference on Information and Financial Engineering (pp. 400-404). IEEE.
  45. Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., & Hua, X. S. (2017, October). Spatio-temporal autoencoder for video anomaly detection. In Proceedings of the 25th ACM international conference on Multimedia (pp. 1933-1941).