DOI QR코드

DOI QR Code

Prediction of Agricultural Purchases Using Structured and Unstructured Data: Focusing on Paprika

정형 및 비정형 데이터를 이용한 농산물 구매량 예측: 파프리카를 중심으로

  • ;
  • 이경희 ((주)빅데이터랩스) ;
  • 라형철 (충북대학교 수의학연구소) ;
  • 최은선 (충북대학교 빅데이터 협동과정) ;
  • 조완섭 (충북대학교 경영정보학과)
  • Received : 2021.11.29
  • Accepted : 2021.12.10
  • Published : 2021.12.31

Abstract

Consumers' food consumption behavior is likely to be affected not only by structured data such as consumer panel data but also by unstructured data such as mass media and social media. In this study, a deep learning-based consumption prediction model is generated and verified for the fusion data set linking structured data and unstructured data related to food consumption. The results of the study showed that model accuracy was improved when combining structured data and unstructured data. In addition, unstructured data were found to improve model predictability. As a result of using the SHAP technique to identify the importance of variables, it was found that variables related to blog and video data were on the top list and had a positive correlation with the amount of paprika purchased. In addition, according to the experimental results, it was confirmed that the machine learning model showed higher accuracy than the deep learning model and could be an efficient alternative to the existing time series analysis modeling.

소비자의 식품소비행동은 소비자 패널 데이터와 같은 정형 데이터 뿐 아니라 매스미디어와 소셜미디어(SNS) 등 비정형 데이터로부터 영향을 받을 가능성이 높아지고 있다. 본 연구에서는 식품소비 관련된 정형 데이터와 비정형 데이터를 연계한 융합데이터 셋에 대하여 딥러닝 기반의 소비예측 모델을 생성하고 이를 검증한다. 연구의 결과는 정형 데이터와 비정형 데이터를 결합할 때 모델 정확도가 향상되었음을 보여주었다. 또한 비정형 데이터가 모델 예측 가능성을 향상시키는 것으로 나타났다. 변수들의 중요도를 식별하기 위해 SHAP 기법을 사용한 결과 블로그 및 비디오 데이터 관련 변수가 상위 목록에 있었고, 파프리카 구매 금액과 양의 상관관계가 있음을 알 수 있었다. 또한 실험 결과에 따르면 머신러닝 모델이 딥러닝 모델보다 높은 정확도를 보였고, 기존의 시계열 분석 모델링에 대한 효율적인 대안이 될 수 있음을 확인하였다.

Keywords

Acknowledgement

본 논문은 정부(식품의약품안전처)의 출연연구사업 지원을 받아 수행된 연구임 (과제고유번호: KMDF-RnD 21163수입안 517-1)

References

  1. Prabhu, C. S. R., Sreevallabh Chivukula, A., Mogadala, A., Ghosh, R., & Livingston, L. M. J. "Predictive Modeling for Unstructured Data. Big Data Analytics: Systems, Algorithms, Applications", (2019). 167-194. 
  2. Schoen, H., Gayo-Avello, D., Takis Metaxas, P., Mustafaraj, E., Strohmaier, M., & Gloor, P. "The power of prediction with social media. Internet Research",(2013). 23(5), 528-543.  https://doi.org/10.1108/IntR-06-2013-0115
  3. Schoen, H.; Gayo-Avello, D.; Metaxas, P.T.; Mustafaraj, E.; Strohmaier, M.; Gloor, P. "The power of prediction with social media". Internet Research 2013. 
  4. Bahceci, O.; Alsing, O. Stock Market Prediction using Social Media Analysis. 2015. 
  5. Artola, C.; Pinto, F.; de Pedraza Garcia, P. Can internet searches forecast tourism inflows? International Journal of Manpower 2015, 36, 103-116.  https://doi.org/10.1108/IJM-12-2014-0259
  6. Cho, W.-S.; Cho, A.; Kwon, K.; Yoo, K.-H. Implementation of smart chungbuk tourism based on SNS data analysis. Journal of the Korean Data and Information Science Society 2015, 26, 409-418.  https://doi.org/10.7465/jkdi.2015.26.2.409
  7. Meza, X.V.; Park, H.W. Organic Products in Mexico and South Korea on Twitter. Journal of Business Ethics 2016, 135, 587-603.  https://doi.org/10.1007/s10551-014-2345-y
  8. Yoo, D.-i. Vegetable Price Prediction Using Atypical Web-Search Data. In Proceedings of 2016 Annual Meeting, July 31-August 2, 2016, Boston, Massachusetts. 
  9. Lee, S.Y. Analysis on how media report regarding FMD(Foot and mouse disease) affects households' consumption of meat product. Sogang University, Seoul, 2014. 
  10. Choi, K.D.; Kang, H.-G.; Joo, H.H. Does the Harmful Information Regarding Food Safety Affect the Consumption Pattern of Consumers? - Focusing on Fukushima Nuclear Accident. Journal of Korean Economics Studies 2016, 34, 41-83. 
  11. Kim, J.; Cha, M.; Lee, J.G. A Model for Nowcasting Commodity Price based on Social Media Data. Journal of Korean Institute of Information Scientists and Engineers 2017, 44, 1258-1268. 
  12. Cho, Y.; Oh, E.; Cho, W.-S.; Nasridinov, A.; Yoo, K.-H.; Rah, H. Relations Between Paprika Consumption and Unstructured Big Data, and Paprika Consumption Prediction. International Journal of Contents 2019, 15, 113-119.  https://doi.org/10.5392/IJOC.2019.15.4.113
  13. Rah, H.; Oh, E.; Yoo, D.-i.; Cho, W.-S.; Nasridinov, A.; Park, S.; Cho, Y.; Yoo, K.-H. Prediction of Onion Purchase Using Structured and Unstructured Big Data. The Journal of the Korea Contents Association 2018, 18, 30-37. 
  14. Som Akhamixay, O. Predictive Modeling of the Amount Purchased Paprika Using Deep Learning and Machine Learning. Chungbuk National University, Cheongju, 2021. 
  15. Seungwon Oh, Namhui Im,Sang-Hyun Lee, Min Soo Kim. "Long-term Price Prediction and Trend Analysis of Garlic Using Prophet Model." Journal of the Korean Data Analysis Society 22.6 (2020): 2325-2336.  https://doi.org/10.37727/jkdas.2020.22.6.2325
  16. Shin, S., Lee, M., & Song, S. (2018). A Prediction Model for Agricultural Products Price with LSTM Network. The Journal of the Korea Contents Association, 18(11), 416-429.  https://doi.org/10.5392/JKCA.2018.18.11.416
  17. Im, J., Kim, W.-Y., Byoun, W.-J., & Shin, S.-J. (2018). Fruit price prediction study using artificial intelligence. The Journal of the Convergence on Culture Technology, 4(2), 197-204.  https://doi.org/10.17703/JCCT.2018.4.2.197
  18. Mi hye Kim, Sung min Hong,Yoon Sanghoo . (2018).The Comparison of Peach Price and Trading Volume Prediction Model Using Machine Learning Technique, .Journal of The Korean Data Analysis Society, 20(6), 2933-2940.  https://doi.org/10.37727/jkdas.2018.20.6.2933
  19. Jeong-min Ju, Sun-mee Kang, Ji-wung Choi, Youngwoo Han. "A Study on the Prediction of Apartment Sale Price Using Machine Learning : Focused on the Collection of Internal and External Data and Price Prediction of Korean Apartments." Proceedings of the Korea Information Processing Society Conference 27.2 (2020): 956-959. 
  20. Yoona Noh, Seungwon Jung, Jaeuk Moon, Eenjun Hwang. "Explainable COVID-19 Forecasting Scheme Using Attention LSTM and SHAP." SIGDB 37.2 (2021): 37-51. 
  21. Do Hyeon Lim, Yu-rin Lee, Jaejun Lee, Kee-Young Kwahk, Hyunchul Ahn. "LightGBM-based Dropout Prediction and Its Interpretation using SHAP." Proceedings of KIIT Conference. 2021.11 (2021): 91-93. 
  22. Hyerin Jeong, Park Jung hoon, Yung-Seop Lee, Changwon Lim. (2020). Visualization of Explainable Artificial Intelligence Techniques Using Variable Importance with Its Applications to Health Information Data. Journal of Health Informatics and Statistics, 45(4), 317-334. https://doi.org/10.21032/jhis.2020.45.4.317