Figure 2.1. Histogram of each response variable.
Figure 2.2. Yearly average of each response variable.
Figure 2.3. Monthly average of each response variable.
Figure 2.4. Response variable vs. Quantile of duration.
Figure 2.5. Response variable vs. Number of subway lines.
Figure 2.6. Response variable vs. Number of exits.
Figure 3.1. Location of subway stations by group.
Figure 3.2. Monthly average of each response variable by group.
Figure 3.3. Yearly average of each response variable by group.
Figure 3.4. Usage area proportion within a radius of 500m around the station.
Figure 3.5. Partial dependence plot of all stations (Model 1).
Table 2.1. Number of subway stations by the number of subway lines
Table 2.2. Description of variables
Table 3.1. Average of predictor by group
Table 3.2. Test root mean squared error of each model using 10-fold cross validation (Model 1)
Table 3.3. Test root mean squared error of each model using 10-fold cross validation (Model 2)
Table 3.4. Test root mean squared error of each group (Model 1)
Table 3.5. Test root mean squared error of each group (Model 2)
Table 3.6. Predicted number of passengers at 8 new stations (2018)
참고문헌
- Breiman, L., Friedman. J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees, Chapman and Hall, New York.
- Breiman, L. (2001). Random forests, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
- Cortes, C. and Vapnik, V. (1995). Support-vector networks, Machine Learning, 20, 273-297. https://doi.org/10.1007/BF00994018
- Horel, A. E. and Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67. https://doi.org/10.1080/00401706.1970.10488634
- Douglas, R. (2015). Gaussian mixture models, Encyclopedia of biometrics.
- Kim, J. I. (2013). The determinants of subway riderships at AM-peak in Daegu metropolitan city: focusing on the land use of station neighborhood areas, Journal of Transport Research, 20, 15-25. https://doi.org/10.34143/JTR.2013.20.1.15
- Kim, J. S. (2016). Subway congestion prediction and recommendation system using big data analysis, Journal of Digital Convergence, 14, 289-295. https://doi.org/10.14400/JDC.2016.14.11.289
- Lee, J., Go, J. Y., Jeon, S., and Jun, C. (2015). A study of land use characteristics by types of subway station areas in Seoul analyzing patterns of transit ridership, The Korea Spatial Planning Review, 84, 35-53. https://doi.org/10.15793/kspr.2015.84..003
- R Development Core Team (2010). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. http://www.R-project.org.
- Ridgeway, G. (2007). Generalized Boosted Models: A guide to the gbm package, https://cran.r-project.org/web/packages/gbm
- Shon, E. Y., Kwon, B. W., and Lee, M. H. (2004). Modelling the subway demand estimation by station using the multiple regression analysis by category, Journal of Korea Society of Transportation, 22, 33-42.
- Song, J. (1991). A study on prediction of passenger demand in Seoul Subway, Statistical Consulting, 6.
- Tianqi, C. and Carlos, G. (2016). XGBoost: A Scalable Tree Boosting System, KDD '16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society B, 58, 267-288.