Fig. 1. Learning process in general ANN
Fig. 2. Architecture of generic CNN model
Fig. 3. Zero padding
Fig. 4. ReLU function
Fig. 5. Resizing feature map by max pooling
Fig. 6. Anchor boxes for object detection
Fig. 7. Demonstration of nine possible anchor boxes
Fig. 8. Mask R-CNN model architecture
Fig. 9. Progress of CNN: From object detection to instance segmentation
Fig. 10. Examples of RGB image and corresponding annotation data
Fig. 11. A sample of training image
Fig. 12. Rotated image without padding
Fig. 13. Mirror padding for image rotation
Fig. 14. Building detection with geometrically transformed images
Fig. 15. Building detection with radiometrically degraded images
Fig. 16. Building detection from unseen images
References
- Audebert, N., Le Saux, B., and Lefevre, S. (2018), Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 140, pp. 20-32. https://doi.org/10.1016/j.isprsjprs.2017.11.011
- Back, C.S. and Yom, J.H. (2018), Comparison of point cloud volume calculated by artificial intelligence learning method and photogrammetric method, Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, 19-20 April, Yongin, Korea, pp. 227-230.
- Ball, J., Anderson, D., and Chan, C. (2017), A comprehensive survey of deep learning in remote sensing: Theories, tools and challenges for the community, Journal of Applied Remote Sensing, Vol. 11. No. 4, pp. 1-54.
- Campos-Taberner, M., Romero-Soriano, A., Gatta, C., Camps-Valls, G., Lagrange, A., Le Saux, B., Beaupere, A., Boulch, A., Chan-Hon-Tong, A., Herbin, S., Randrianarivo, H., Ferecatu, M., Shimoni, M., Moser, G., and Tuia, D. (2016), Processing of extremely highresolution LiDAR and RGB data: Outcome of the 2015 IEEE GRSS data fusion contest-Part A: 2-D contest, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 9, No. 12, pp. 5547-5559. https://doi.org/10.1109/JSTARS.2016.2569162
- Choe, Y.J. and Yom, J.H. (2017), Downscaling of MODIS land surface temperature to LANDSAT scale using multi-layer perceptron, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Vol. 35, No. 4, pp. 313-318. (in Korean with English abstract) https://doi.org/10.7848/KSGPC.2017.35.4.313
- Chung, D. and Lee, I. (2017), Point cloud classification base on deep learning, Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography, Yeosu, Korea, pp. 110-113. (in Korean with English abstract)
- Deng, Z., Sun, H., Zhou, S., Zhao, Lei, L., and Zou, H. (2018), Multi-scale object detection in remote sensing imagery with convolutional neural networks, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 3-22. https://doi.org/10.1016/j.isprsjprs.2018.04.003
- Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., and Garcia-Rodriguez, J. (2017), A review on deep learning techniques applied to semantic segmentation, arXiv:1704.06857.
- Girshick, R. (2015), Fast R-CNN, IEEE International Conference on Computer Vision, ICCV 2015, 13-16 December, Santiago, Chile, pp. 1440-1448.
- Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2016), Region-based convolutional networks for accurate object detection and segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 38, No. 1, pp. 1-16. https://doi.org/10.1109/TPAMI.2016.2592468
- Hazirbas, C., Ma, L., Domokos, C., and Cremers, D. (2016), FuseNet: Incorporating depth into semantic segmentation via fusion-based CNN architecture, Proceedings of the Asian Conference on Computer Vision, Vol. 2, 20-24 November, Taipei, Taiwan.
- He, k., Gkioxari, G., Dollar, p., and Girshick, R. (2017), Mask R-CNN, Proceedings of IEEE International Conference on Computer Vision (ICCV) 2017, 22-29 October, Venice, Italy, pp. 2980-2988.
- Hertz, J., Krogh, A., and Palmer, R. (1991), Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 327p.
- Kang, J., Korner, M., Wang, Y., Taubenbock, H., and Zhu, X. (2018), Building instance classification using street view images, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 44-59. https://doi.org/10.1016/j.isprsjprs.2018.02.006
- Kemker, R., Salvaggio, C., and Kanan, C. (2018), Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 60-77. https://doi.org/10.1016/j.isprsjprs.2018.04.014
- Kim, H. and Bae, T., (2017), Preliminary study of deep learning-based precipitation prediction, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography, Vol. 35, No. 5, 423-430. https://doi.org/10.7848/KSGPC.2017.35.5.423
- Krizhevsky, A., Sutskever, I., and Hinton, G. (2012), ImageNet classification with deep convolutional neural networks, Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1, 3-8 December, Lake Tahoe, Nevada, pp. 1097-1105.
- LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R. Hubbard, W., and Jackel, L. (1989), Backpropagation applied to handwritten zip code recognition. Neural Computation, No. 1, Vol. 4, pp. 541-551. https://doi.org/10.1162/neco.1989.1.4.541
- Lee, G. and Yom, J.H. (2018), Design and implementation of web-based automatic preprocessing system of remote sensing imagery for machine learning modeling, Journal of the Korean Society for Geospatial Information Science, Vol. 26 No. 1, pp. 61-67. (in Korean with English abstract)
- Long, J., Shelhamer, E., and Darrell, T. (2015), Fully convolutional networks for semantic segmentation, Proceedings of IEEE Conference on Computer Vision and Patton Recognition, 7-12 June, Boston, MA, pp. 3431-3440.
- Marmanis, D., Wegner, J., Galliani, S., Schindler, K., Datcu, M., and Stilla, U. (2016), Semantic segmentation of aerial images with an ensemble of CNNS, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 3-3, XXIII ISPRS Congress, 12-19 July, Prague, Czech Republic, pp. 473-480.
- Maturana, D. and Scherer, S. (2015), 3D Convolutional neural networks for landing zone detection from LiDAR, IEEE International Conference on Robotics and Automation, Seattle, Washington, 26-30 May, pp. 3471-3478.
- McCulloch, W. and Pitts, W. (1943), A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, Vol. 7, pp. 115-133.
- Oh, H. (2010), Landslide detection and landslide susceptibility mapping using aerial photos and artificial neural networks, Korean Journal of Remote Sensing, Vol. 26, No. 1, pp. 47-57. (in Korean with English abstract)
- Pang, Y., Sun, M., Jiang, X., and Li, X. (2018), Convolution in convolution for network in network, IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, No. 5, pp. 1587-1597. https://doi.org/10.1109/TNNLS.2017.2676130
- Parthasarathy, D. (2017), A brief history of CNNs in image segmentation: From R-CNN to Mask R-CNN, https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4 (last date accessed: 6 September 2018).
- Ren, S., He, K., Girshick, R., and Sun, J. (2017), Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 6, pp. 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031
- Rosenblatt, F. (1958), The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, Vol. 65, No. 6, pp. 386-408. https://doi.org/10.1037/h0042519
- Rumelhart, D., Hinton, G., and Williams, R. (1986), Learning internal representations by back-propagating errors, Nature, Vol. 323, No. 9, pp. 533-536. https://doi.org/10.1038/323533a0
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang Z., Karpathy, A., Khosla, A., Bernstein, M., and Berg, A. (2015), Imagenet large scale visual recognition challenge, International Journal of Computer Vision, Vol. 115, No. 3, pp. 211-252. https://doi.org/10.1007/s11263-015-0816-y
- Schenk, T. (1999), Digital Photogrammetry: Volume 1, TerraScience, Laurelville, OH, 428p.
- Shaikh, F. (2018), Automatic image captioning using deep learning (CNN and LSTM) in PyTorch, Analytics vidhya, https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/ (last date accessed: 31 October 2018).
- Simard, P., Steinkraus, D., and Platt, J. (2003), Best practices for convolutional neural networks applied to visual document analysis, Proceedings of the Seventh International Conference on Document Analysis and Recognition, ICDAR 2003, 3-6 August, Vol. 2, pp. 958-962.
- Tokarczyk, P., Wegner, J., Walk, S., and Schindler, K. (2015), Features, color spaces, and boosting: new insights on semantic classification of remote sensing images, IEEE Transactions on Geoscience And Remote Sensing, Vol. 53, No. 1, pp. 280-295. https://doi.org/10.1109/TGRS.2014.2321423
- You, Q., Jin, H., Wang, Z., Fang, C., and Luo, J. (2016), Image captioning with semantic attention, IEEE Conference on Computer Vision and Pattern Recognition, 26 June-1 July, Las Vegas, Nevada, pp. 4651-4659.
- Vo, A.V., Truong-Hong, L., Laefer, D., Tiede, D., d'Oleire-Oltmanns, S., Baraldi, A., Shimoni, M., Moser, G., and Tuia, D. (2016), Processing of extremely high resolution LiDAR and RGB Data: Outcome of the 2015 IEEE GRSS data fusion contest-Part B: 3-D Contest, IEEE Journal of Selected Topics In Applied Earth Observations And Remote Sensing, Vol. 9, No. 12, pp. 5560-5575. https://doi.org/10.1109/JSTARS.2016.2581843
- Wang, S., Quan, D., Liang, X., Ning, M., Guo, Y., and Jiao, L. (2018), A deep learning framework for remote sensing image registration, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 148-164. https://doi.org/10.1016/j.isprsjprs.2017.12.012
- Xing, Y., Wang, M., Yang, S., and Jiao, L. (2018), Pansharpening via deep metric learning, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 165-183. https://doi.org/10.1016/j.isprsjprs.2018.01.016
- Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., and Bengio, Y. (2015), Show, attend and tell: Neural image caption generation with visual attention, International Conference on Machine Learning, 6-11 July, Lille, France, pp. 2048-2057.
- Zhang, B., Gu, J., Chen, C., Han, J., Su, X., Cao, X., and Liu, J. (2018), One-two-one networks for compression artifacts in remote sensing, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 184-196. https://doi.org/10.1016/j.isprsjprs.2018.01.003
Cited by
- 포인트 클라우드에서 딥러닝을 이용한 객체 분류 및 변화 탐지 vol.50, pp.2, 2018, https://doi.org/10.22640/lxsiri.2020.50.2.37
- 적외선 영상, 라이다 데이터 및 특성정보 융합 기반의 합성곱 인공신경망을 이용한 건물탐지 vol.38, pp.6, 2020, https://doi.org/10.7848/ksgpc.2020.38.6.635
- 인공지능 기반 유해조류 탐지 관제 시스템 vol.16, pp.1, 2018, https://doi.org/10.13067/jkiecs.2021.16.1.175
- 항공영상을 이용한 딥러닝 기반 건물객체 추출 기법들의 비교평가 vol.39, pp.3, 2018, https://doi.org/10.7848/ksgpc.2021.39.3.157