DOI QR코드

DOI QR Code

Classification of Parent Company's Downward Business Clients Using Random Forest: Focused on Value Chain at the Industry of Automobile Parts

랜덤포레스트를 이용한 모기업의 하향 거래처 기업의 분류: 자동차 부품산업의 가치사슬을 중심으로

  • Kim, Teajin (Department of Industrial and Information Systems, Public Policy and Information Technology Professional Graduate School, SNUT) ;
  • Hong, Jeongshik (Department of Industry Information System Engineering, SNUT) ;
  • Jeon, Yunsu (Department of Data Science, SNUT) ;
  • Park, Jongryul (Department of Data Science, SNUT) ;
  • An, Teayuk (Business on Communication, Ltd)
  • Received : 2017.12.01
  • Accepted : 2018.02.22
  • Published : 2018.02.28

Abstract

The value chain has been utilized as a strategic tool to improve competitive advantage, mainly at the enterprise level and at the industrial level. However, in order to conduct value chain analysis at the enterprise level, the client companies of the parent company should be classified according to whether they belong to it's value chain. The establishment of a value chain for a single company can be performed smoothly by experts, but it takes a lot of cost and time to build one which consists of multiple companies. Thus, this study proposes a model that automatically classifies the companies that form a value chain based on actual transaction data. A total of 19 transaction attribute variables were extracted from the transaction data and processed into the form of input data for machine learning method. The proposed model was constructed using the Random Forest algorithm. The experiment was conducted on a automobile parts company. The experimental results demonstrate that the proposed model can classify the client companies of the parent company automatically with 92% of accuracy, 76% of F1-score and 94% of AUC. Also, the empirical study confirm that a few transaction attributes such as transaction concentration, transaction amount and total sales per customer are the main characteristics representing the companies that form a value chain.

가치사슬은 경쟁우위 강화를 위한 전략적 도구로써 주로 기업수준, 산업수준에서 분석되어 왔다. 그런데 기업수준에서 가치사슬 분석을 수행하기 위해서는 분석 기업의 거래처 기업들이 그 기업의 가치 사슬에 속하는지의 여부에 따라 분류되어야 한다. 단일 기업에 대한 가치사슬 분류는 전문가들에 의해 원활히 수행될 수 있지만 다수의 기업을 대상으로 분류할 때는 많은 비용과 시간이 소요되는 등의 한계점이 따른다. 따라서 본 연구에서는 실거래 데이터를 기반으로 특정 기업의 거래처 기업들을 분류해서 가치사슬 기업을 자동적으로 도출해주는 모형을 제안하고자 한다. 총 19개의 거래 속성 변수를 실거래 데이터로부터 도출하여 기계학습의 입력 데이터의 형태로 가공하였고, 랜덤포레스트 알고리즘을 이용하여 가치사슬 분류 모형을 구축하였다. 자동차 부품 기업 사례에 본 연구 모형을 적용한 결과, 정확도 92%, F1-척도 76% 그리고 AUC 94%로 자동적 가치사슬 분류의 가능성을 확인하였다. 또한 거래집중도, 거래금액 그리고 거래처별 총 매출액 등과 같은 거래 속성들이 가치사슬에 속하는 기업들을 대표하는 주요 특성임을 확인하였다.

Keywords

References

  1. Archer, K. J. and Kimes, R. V., “Empirical characterization of random forest variable importance measures,” Computational Statistics & Data Analysis, Vol. 52, No. 4, pp. 2249-2260, 2008. https://doi.org/10.1016/j.csda.2007.08.015
  2. Barney, J. B. and Ouchi, W. G., Organizational economics, San Francisco: Jossey-Bass, 1986.
  3. Breiman, L., “Random Forests,” Machine learning, Vol. 45, No. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
  4. Brown, I. and Mues, C., “An experimental comparison of classification algorithms for imbalanced credit scoring data sets,” Expert Systems with Applications, Vol. 39, No. 3, pp. 3446-3453, 2012. https://doi.org/10.1016/j.eswa.2011.09.033
  5. Choi, S. H. and Choi, J. I., “GVC Case Analysis of the Motor Industry: Focusing on Hyundai Motor,” Journal of Digital Convergence, Vol. 14, No. 12, pp. 73-84, 2016. https://doi.org/10.14400/JDC.2016.14.12.73
  6. Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., and Keogh, E., "Querying and mining of time series data: experimental comparison of representations and distance measures," Proceedings of the VLDB Endowment, Vol. 1, No. 2, pp. 1542-1552, 2008.
  7. Gang, S. H., Kim, C. H., and Chung, H. Y., "Machining process improvement of automobile hub assembly parts," Proceedings of the Korea Academy Industrial Cooperation Society, pp. 242-244, 2015.
  8. Go, S. W., Hong, D. P., Gang, S. H., Do, J. H., Lee, G. H., and Yu, S., S., Structural changes and countermeasures in each sector of the economy in the digital economy 3, Korea Information Strategy Development Institute, pp. 1-208, 2005.
  9. Han, E. J., Screening Test Data Analysis for Cataract Happening Prediction Model using Random forest, Yonsei University Graduate School of Medical Statistics, Master's Thesis, 2004.
  10. Hong, J. S., Park, K. H., and Park, J. R., “Hybrid Classifiers of Classification Techniques for Mixed Data,” Journal of the Korean Institute of Industrial Engineers, Vol. 43, No. 5, pp. 341-349, 2017. https://doi.org/10.7232/JKIIE.2017.43.5.341
  11. Kim, C. S., Jo, H. J., and Jeong, J. H., "Modular Production and Hyundai Production System: The Case of Hyundai Mobis," Economy and Society, Vol. 92, pp. 351-385, 2011.
  12. Kim, J. H., “Development of Fostering Strategies for MICE Industry through the Value Chain Analysis,” Northeast Asia Tourism Research, Vol. 7, No. 4, pp. 131-150, 2011.
  13. Kim, K. S., "The Characteristics of Corporate Growth and Innovation in the Materials, Components, and Equipments Sectors of Korean Display Industrial Value Chain," Journal of Korea Technology Innovation Society, Vol. 20, No. 1 pp. 205-238, 2017.
  14. Kim, S. J. and An, H. C., "Random Forest's Assessment Model for Corporate Bond Ratings," Korea Intelligent Information Systems Society Spring Conference, pp. 371-376, 2014.
  15. Kim, T. J., Lee, J. H., and Hong, J. S., “Supply Network Analysis of Second and Third Outsourcing Firms with E-Invoice at Automobile Parts Industry: Focused to Brake Manufacturing Firms,” The Jounal of Society for e-Business Studies, Vol. 21, No. 3, pp. 79-99, 2016.
  16. Kotsiantis, S. B., Zaharakis, I., and Pintelas, P., "Supervised machine learning: A review of classification techniques," Informatica, Vol. 31, pp. 249-268, 2007.
  17. Kwon, A. N., Variable Selection Using Random Forest, Inha University Graduate School of Statistics, Master's Thesis, 2013.
  18. Lee, H. J., Park, J. S., and Kim, M. T., "Transformation of Value Chain and Business Models in the 3G Mobile Service Industry," Proceedings of Symposium of the Korean Institute of communications and Information Sciences, pp. 1833-1836, 2005.
  19. Lee, H. S., Lim, D. H., and Mun, Y. S., "Value chain analysis system using company data," Korean Institute Of Industrial Engineers Fall Conference, pp. 1974-1985, 2016.
  20. Lee, R. E., Kim, K. T., Lee, S. J., Jeong, G. J., Lee, S. J., Lee, H. S., Mun, Y. S., and Lim, D. H., "Data-based Value Chain Construction Algorithm Development and Smart Device Application," Korean Operations Research and Management Society Spring Conference, pp. 109-128, 2016.
  21. Liaw, A., and Wiener, M., "Classification and Regression by Random Forest," R News, Vol. 2/3, pp. 18-22, 2002.
  22. Linden, G., Kraemer, K. L., and Dedrick, J., "Who captures value in a global innovation network?: the case of Apple's iPod," Communications of the ACM, Vol. 52, No. 3, pp. 140-144, 2009. https://doi.org/10.1145/1467247.1467280
  23. Li, R. H. and Belford, G. G., "Instability of decision tree classification algorithms," In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 570-575, 2002.
  24. Liu, M., Wang, M., Wang, J., and Li, D., "Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar," Sensors and Actuators B: Chemical, Vol. 177, pp. 970-980, 2013. https://doi.org/10.1016/j.snb.2012.11.071
  25. Macher, J. T., Mowery, D. C., and Simcoe, T. S., "e-Business and disintegration of the semiconductor industry value chain," Industry and Innovation, Vol. 9, No. 3, 155-181, 2002. https://doi.org/10.1080/1366271022000034444
  26. Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., and de Mendonca, A., "Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests," BMC research notes, Vol. 4, No. 1, p. 299, 2011. https://doi.org/10.1186/1756-0500-4-299
  27. Murthy, S. K., “Automatic construction of decision trees from data: A multi-disciplinary survey,” Data mining and knowledge discovery, Vol. 2, No. 4, pp. 345-389, 1998. https://doi.org/10.1023/A:1009744630224
  28. Pal, M., “Random forest classifier for remote sensing classification,” International Journal of Remote Sensing, Vol. 26, No. 1, pp. 217-222, 2005. https://doi.org/10.1080/01431160412331269698
  29. Park, C. D., Chae, Y. J., and Park, J. G., “An Analysis on the Value Chain of Korean Bioenergy Industry,” Journal of Energy Engineering, Vol. 23, No. 2, pp. 102-113, 2014. https://doi.org/10.5855/ENERGY.2014.23.2.102
  30. Park, J. M., “An Empirical Study on the Impact of Relationship between Parent and Collaboration Companies on Business Performance,” Journal of Industrial Economics and Business, Vol. 15, No. 1, pp. 303-319, 2002.
  31. Park, K. S. and Lee, C. W., “Value Chain System and Management of Cultural Contents Industry in Daegu,” Journal of The Korean Association of Regional Geographers, Vol. 13, No. 2, pp. 171-186, 2007.
  32. Porter, M. E. and Advantage, C., Creating and Sustaining Superior Performance, 1985.
  33. Rahani, A. R. and Al-Ashraf, M., "Production flow analysis through value stream mapping: a lean manufacturing process case study," Procedia Engineering, Vol. 41, pp. 1727-1734, 2012. https://doi.org/10.1016/j.proeng.2012.07.375
  34. Ryu, J. H., Choi, T. G., and Park, J. G., “An Analysis on the Value Chain and the Value System of the Korean Wind Power Industry,” Journal of Energy Engineering, Vol. 23, No. 1, pp. 46-57, 2014. https://doi.org/10.5855/ENERGY.2014.23.1.046
  35. Statnikov, A., Wang, L., and Aliferis, C. F., "A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification," BMC bioinformatics, Vol. 9, No. 1, p. 319, 2008. https://doi.org/10.1186/1471-2105-9-319
  36. Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., and Feuston, B. P., "Random forest: a classification and regression tool for compound classification and QSAR modeling," Journal of chemical information and computer sciences, Vol. 43, No. 6, pp. 1947-1958, 2003. https://doi.org/10.1021/ci034160g
  37. Yeo, I. G., Lee, U. H., Gang, S. R., Im, B. H., Ji, Y. G., Min, N. G., Jo, N. Y., Lee, J. M., Jang, Y. S., and Kim, Y. M., Planning Report of the Technology Roadmap for the Industrial Convergence-Transportation System(Automobile), pp. 1-197, Korea Industrial Technology Development Agency, 2010.
  38. Yun, M. S., "Industry Cluster," INCHAM Business News, pp. 15-23, 2003.
  39. Zhang, G., Patuwo, B. E., and Hu, M. Y., "Forecasting with artificial neural networks:: The state of the art," International Journal of forecasting, Vol. 14, No. 1, pp. 35-62, 1998. https://doi.org/10.1016/S0169-2070(97)00044-7

Cited by

  1. 대일 무역분쟁으로 인한 글로벌 가치사슬 변화와 정부 R&D 투자전략 - 자동차산업을 중심으로 - vol.21, pp.1, 2021, https://doi.org/10.5392/jkca.2021.21.01.012
  2. 머신 러닝과 데이터 전처리를 활용한 증류탑 온도 예측 vol.59, pp.2, 2018, https://doi.org/10.9713/kcer.2021.59.2.191