Clustering and classification to characterize daily electricity demand

시간단위 전력사용량 시계열 패턴의 군집 및 분류분석

  • Park, Dain (Department of Statistics, Daegu University) ;
  • Yoon, Sanghoo (Department of Statistics and Computer Science, Daegu University & Institute of Basic Science, Daegu University)
  • 박다인 (대구대학교 일반대학원 통계학과) ;
  • 윤상후 (대구대학교 전산통계학과, 대구대학교 기초과학연구소)
  • Received : 2017.02.28
  • Accepted : 2017.03.27
  • Published : 2017.03.31


The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

전력 공급 시스템의 효율적인 운영을 위해 전력수요예측은 필수적이다. 본 연구에서는 군집분석과 분류분석을 이용하여 일 단위 시간별 전력수요량 시계열 패턴의 유형을 살펴보고자 한다. 전력거래소에서 수집된 2008년 1월 1일부터 2012년 12월 31일까지의 일 단위 시간별 전력수요량 데이터를 추세성분, 계절성분, 오차 성분으로 구성된 시계열 자료로 변환하여 사용하였다. 추세성분을 제거한 시계열 자료의 패턴을 구분하기 위한 군집 분석방법은 k-평균 군집분석 (k-means), 가우시안혼합모델 혼합 모델 군집분석 (Gaussian mixture model), 함수적 군집분석 (functional clustering)을 고려하였다. 주성분분석을 통해 24시간 자료를 2개의 요인로 축소한 후 k-평균 군집분석과 가우시안 혼합 모델, 함수적 군집분석을 수행하였다. 군집분석 결과를 토대로 2008년부터 2011년까지 총 4년간 데이터를 4가지 분류분석방법인 의사결정나무, RF (random forest), Naive bayes, SVM (support vector machine)을 통해 훈련시켜 2012년 군집을 예측하였다. 분석 결과 가우시안 혼합 분포기반 군집분석과 RF를 이용한 군집예측 결과의 성능이 가장 우수하였다.


Supported by : 대구대학교


  1. Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
  2. Cho, H., Goude, Y., Brossat, X. and Yao, Q.(2013). Modeling and forecasting daily electricity load curves: A hybrid approach. Journal of the American Statistical Association, 108, 7-21.
  3. Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D. and Weingessel, A. (2005). Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.5-7,
  4. Fraley, C., Raftery, A. E., Scrucca, L., Murphy, T. B. and Fop, M. (2016). mclust: Normal mixture modelling for model-based clustering, classification, and density estimation,,5.
  5. Hwang, H. M., Lee, S. H., Park, J. B., Park, Y. G., and Son, S. Y. (2015). Load forecasting using hierarchical clustering method for building. Journal of the Korean Institute of Illuminating and Electrical Installation Engineers, 59-65.
  6. Kang, D. H., Park, J. D. and Song, K. B. (2016). 24-Hour load forecasting for anomalous weather days using hourly temperature. The Transactions of The Korean Institute of Electrical Engineers, 65, 1144-1150.
  7. Kim, C. H., Koo, B. G. and Park, J. H. (2012). Short-term electric load forecasting using data mining technique. Journal of Electrical Engineering & Technology, 7, 807-813.
  8. Liaw, A, and Wiener, M. (2002). Classification and regression by randomForest. IR news, 2, 18-22
  9. Lim, J. H., Kim, S. Y., Park, J. D. and Song, K. B. (2013). Representative temperature assessment for improvement of short-term load forecasting accuracy. Journal of the Korean Institute of Illuminating and Electrical Installation Engineers, 27, 39-43.
  10. Ma, P., Castillo-Davis, C. I., Zhong, W. and Liu, J. S. (2006). A data-driven clustering method for time course gene expression data. Nucleic Acids Research, 34, 1261-1269.
  11. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281-297.
  12. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C. C. and Lin, C. C. (2015). Package 'e1071'. The Comprehensive R Archive Network, Available at
  13. Park, C. (2016). A simple diagnostic statistic for determining the size of random forest. Journal of the Korean Data & information Science Society, 27, 855-863.
  14. Scott, A. J. and Symons, M. J. (1971). Clustering methods based on likelihood ratio criteria. Biometrics, 27, 387-397.
  15. Song, K. B., Baek, Y. S., Hong, D. H., and Jang, G. (2005). Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE transactions on power systems, 20, 96-101.
  16. Therneau, T., Atkinson, B., Ripley, B., and Ripley, M. B. (2015). Package 'rpart', Available online
  17. Wi, Y. M. and Min, Y. K. (2016). Weekly peak load forecasting using weather stochastic model and weather sensitivity. The Transactions of the Korean Institute of Electrical Engineers, 64, 41-47.
  18. Yoon, S. H. and Choi, Y. J. (2015). Functional clustering for electricity demand data: A case study. Journal of the Korean Data & information Science Society, 26, 885-894.