DOI QR코드

DOI QR Code

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining

데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발

  • Yoon, Seungjin (Dept. of Military Operation Research, Korea National Denfense University) ;
  • Kim, Suhwan (Dept. of Military Operation Research, Korea National Denfense University) ;
  • Shin, Kyungshik (Ewha school of Business, Ewha Womans University)
  • 윤승진 (국방대학교 군사운영분석학과) ;
  • 김수환 (국방대학교 군사운영분석학과) ;
  • 신경식 (이화여자대학교 경영대학)
  • Received : 2015.06.05
  • Accepted : 2015.06.16
  • Published : 2015.09.30

Abstract

In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

최근, 군에서 가장 이슈가 되고 있는 문제는 기강 해이, 복무 부적응 등으로 인한 병력 사고이다. 이 같은 사고를 예방하는 데 있어 가장 중요한 것은, 사고의 요인이 될 수 있는 문제를 사전에 식별 관리하는 것이다. 이를 위해서 지휘관들은 병사들과의 면담, 생활관 순찰, 부모님과의 대화 등 나름대로의 노력을 기울이고 있기는 하지만, 지휘관 개개인의 역량에 따라 사고 징후를 식별하는 데 큰 차이가 나는 것이 현실이다. 본 연구에서는 이러한 문제점을 극복하고자 모든 지휘관들이 쉽게 획득 가능한 객관적 데이터를 활용하여 사고를 예측해 보려 한다. 최근에는 병사들의 생활지도기록부 DB화가 잘 되어있을 뿐 아니라 지휘관들이 병사들과 SNS상에서 소통하며 정보를 얻기 때문에 이를 데이터화 하여 잘 활용한다면 병사들의 사고예측 및 예방이 가능하다고 판단하였다. 본 연구는 이러한 병사의 내부데이터(생활지도기록부) 및 외부데이터(SNS)를 활용하여 그들의 관심분야를 파악하고 사고를 예측, 이를 지휘에 활용하는 데이터마이닝 문제를 다루며, 그 방법으로 토픽분석 및 의사결정나무 방법을 제안한다. 연구는 크게 두 흐름으로 진행하였다. 첫 번째는 병사들의 SNS에서 토픽을 분석하고 이를 독립변수화 하였고 두 번째는 병사들의 내부데이터에 이 토픽분석결과를 독립변수로 추가하여 의사결정나무를 수행하였다. 이 때 종속변수는 병사들의 사고유무이다. 분석결과 사고 예측 정확도가 약 92%로 뛰어난 예측력을 보였다. 본 연구를 기반으로 향후 장병들의 사고예측을 과학적으로 분석, 맞춤식으로 관리한다면 군대 내 각종 사고를 미연에 예방하는데 기여할 것으로 기대된다.

Keywords

References

  1. Albright, R., Taming Text with the SVD, SAS Institute Inc., 2006.
  2. Beaver, W., "Financial ratios as predictors of failure. Empirical research in Accounting; Selected studies," Journal of Accounting Research, Vol. 5(1966), 71-111.
  3. Bergerson, K. and D. C. Wunsch, "A Commodity Trading Model Based on a Neural Network-Expert System Hybrid," Proceedings of the IEEE International conference on Neural Networks, Seattle, Washington, (1991).
  4. Casey, C., McGee, V. and C. Stickney, "Discriminating between reorganized and liquidated firms in bankruptcy," The Accounting Review, April (1986), 249-262.
  5. Emery, G. W. and K. O. Cogger, "The measurement of liquidity," Journal of Accounting Research, Vol. 20, No. 2(1982), 290-303. https://doi.org/10.2307/2490741
  6. Hand, D. J., Mannila, H., and P. Smyth, Principles of Data Mining, Cambridge, MA:MIT Press, 2001.
  7. Hanweak, G. A., "Predicting Bank Failure - Research Papers in Banking and Economics," Financial Studies Section, FRB, November (1977).
  8. Hong S.-H. and K.-S. Shin, "Using GA based Input Selection Method for Artificial Neural Network Modeling: Application to Bankruptcy Prediction," Journal of Intelligence and Information Systems, Vol.9, No.1(2003), 227-249
  9. Johnson, W. B., "The Cross-Sectional Stability of Financial Ratio Patterns," Journal of Financial and Quantitative Analysis, Vol. 14, No. 5(1979), 97-108.
  10. Jung, J. B., "A proposal of new method of recruits classification using a statistical clustering," Proceedings of the Korean Institute of Industrial Engineers, (2009), 401-411.
  11. Kang, K. Y., "Effective assignment method to promote recruit's proficiency," Master's Dissertation, Korea National Defense University, 2010.
  12. KIDA, "Interpretation of Aptitude Adaptation Degree," 2012.
  13. Kim, S.-W, G.-G. Kim, and B.-K. Yoon, "A Study on a way to usilize Big data Analytics in the Defense Area," The Korean Operations Research and Management Science Society, Vol.39, No.2(2014), 133-134. https://doi.org/10.7737/JKORMS.2014.39.3.133
  14. Kim, H. S., "A study of recruit's assignment method using AHP and goal programming," Master's Dissertation, Korea National Defense University, 1998.
  15. Kim, Y.-S., N.-G. Kim, and S.-R. Jeong, "Stock-Index Invest Model Using News Big Data Opinion Mining," Journal of Intelligence and Information System, Vol.18, No.2(2012), 143-156. https://doi.org/10.13088/JIIS.2012.18.2.143
  16. Lee, E. G., and S. Y. Park, "Emotional & Behavioral problems in children from Broken Families," Journal of the Korean Home Economics Association, Vol. 42, No.12(2004), 191-204.
  17. Liu, B., Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, 2012.
  18. Martin, D., "Early Warning of Bank Failure: A Logit Regression Approach," Journal of Banking and Finance, Vol. 1, No. 3(1977), 249-276. https://doi.org/10.1016/0378-4266(77)90022-X
  19. Ok, J.-K. and K.-J. Kim, "Integrated Corporate Bankruptcy Prediction Model using Genetic Algorithms," Journal of Intelligence and Information System, Vol.15, No.4(2009), 99-120.
  20. Salton G. and M. J. McGill, Introduction to modern information retrieval, McGraw-Hill, 1983.
  21. Yang, W. "Stock price predictin vased on fuzzy logic," Proceedings of the Sixth International Conference on Machine Learning and Cybernetics, Vol.3(2007), 19-22.
  22. Witten, I, H., Text Mining, Practical Handbook of Internet Computing, CRC Press, 2004.