1. Introduction
In computer vision, efforts to remove shadows from images have existed for a long time. If it is possible to remove the shadow from the image, it is possible to provide a cleaner map to the user by removing the shadow from the image taken from the satellite, and also to remove the dark shadow from the face of a person.
As such, the technique of removing shadows has attracted great attention because they can be used in various fields. Removing shadows is generally divided into two stages, which first involves the process of detecting shadows. After that, it consists of removing or restoring the shadow from the detected area. The process of detecting and restoring shadows can be performed in separate steps, or there are methods that can be performed at once.
(Figure 1) (left)Shadow, (middle)shadow-mask, (right)shadow-free image from ISTD dataset. In ISTD, the shadow image is created by using a human body or an object on the floor without shadows caused by sunlight.
Deep learning using neural networks has been successful in various fields in computer vision recently. Using CNN (Convolutional Neural Network), it successfully performed tasks in image classification, object detection, and image segmentation, and showed the performance overwhelming the results of existing studies that did not utilize deep learning. This soon became an opportunity to apply deep learning to the field of detecting or removing shadows, and various studies using deep learning began. As a method of detecting shadow using deep learning, a method of detecting by segment by pixel-wise rather than using a bounding box is generally used. The method of removing shadows using deep learning was also mainly studied methods of restoring the area in units of pixels based on the detected results or generating the entire image.
1.1 Common shadow data shooting methods
In general, deep learning networks require a very large amount of data during training, and it is known that the larger the amount, the more it affects performance. That's why we need data about it to train a network that detects and removes shadows. To detect the shadow, the shadow image, which is the original image with shadow, and the shadow mask, which is a binary mask for the shadow, are required. And to restore and remove the shadow, the shadow image and the shadow-free image without shadow at all are required. To detect and remove shadows at once, three types of data are all required. However, shadow data is very difficult to collect compared to normal data. When collecting data, for example, to make a car class in CIFAR-10, it is only necessary to shoot cars, but the process of shooting shadow data is relatively very difficult. In the case of the shadow mask as the previous data collection technique, the shadow area in the image was often segmented manually by a person. In the case of shade-free, it was often waiting for the shadow to disappear. This method has the disadvantage that it is very difficult to collect high-quality data because it is often inaccurate, and the amount of data is very small due to very time-consuming. To improve these points, fix the camera on a simple floor with no shadows on a sunny day, shoot a shadow-free video, and then use the umbrella or a person's arm or accessory to generate a shadow image. After that, the method of collecting shadow masks using image subtraction of shadow images and shadow-free images was used a lot. As a very large amount of data with all three types of data can be obtained at once, time can be greatly reduced compared to the previous method.
(Figure 2) Shadow images from the SRD dataset. The first row is shadow images, the second row is shadow-free images. Shadows were created using props and people and were shot under strong sunlight. As a result, the penumbra region is hard to see.
However, this method was only taken on a very sunny day, so the Penumbra area was not sufficiently considered, and the cloudy day other than the sunny day was not considered. For this reason, when learning the neural network, the prediction of untrained penumbra and cloudy weather was not well done.
1.2 Building, satellite shadow data shooting methods
All the aforementioned methods were methods for shooting or collecting data with relatively small shadows such as shadows of objects or people. However, some types of shadows include large or wide shadows, such as buildings and satellites. This also requires all three types of shadow data to train the deep learning network. However, the method of collecting data by creating the aforementioned shadow cannot be used in buildings or satellite images. Because it is impossible to generate an image without a shadow at all, in buildings and satellite images, there was only a way to shoot when the shadow existed and when it didn't exist, which was also a huge time-consuming method. This has made building or satellite shadow datasets very difficult. Some studies have used data processing methods before deep learning to generate data based on the detection of rough shadows. However, since all shadows were not accurately detected, the reliability of the data was degraded. For these reasons, building datasets for buildings and satellites has been considered a very challenging problem.
In this paper, we propose a method to generate virtual data using Unity3D graphic tools for data about buildings and satellites that are difficult to generate in reality. After placing a 3D object modeled with a building that exists in reality in a virtual space, a lighting effect was applied to generate a virtual shadow by utilizing it like sunlight. When shooting, a virtual camera was placed to implement the desired scene. As a result, it is possible to obtain all three types of shadow data, shadow image, shadow mask, and shadow-free with a single shot, and by adding a function to automatically shoot from various angles, it is possible to shoot all three types of images in 0.01 second. Besides, as a result of shooting buildings in various regions for 1 hour, about 1000 data could be collected in 1 hour. Also, virtual image data that looks like a satellite image could be obtained by placing a camera very high.
The method proposed in this study contributes to that it can provide virtual shadow data for buildings and satellites that do not exist in reality. There is also a contributing point that the amount of data can be taken quickly. The dataset generated in this method is meaningful in helping deep-learning buildings and satellite shadow detection and removal research by solving the lack of data, and can also be used for test purposes before using a small real-world dataset.
2. Related work
2.1 Traditional methods
In the first attempt to detect and remove shadows, there were methods using color information, edges [1], or threshold [2] using histograms. In Dong and Wang [1], an attempt was made to detect a shadow by capturing a part that is rapidly darkened by using color and an edge, and in Gryka and Terry [2], an attempt was made to simply detect a very dark part of a histogram by using a threshold. However, there were various cases in natural images, rather than simple images shot under certain conditions, so there were several problems. In some cases, black clothes worn by humans were mistaken for shadows, or black parts were mistaken for shadows, and most commonly, the detection was performed without properly distinguishing the areas of umbra and penumbra.
Since the Umbra and penumbra areas of the shadows have different dark levels, they must be detected separately, but in many cases, the two areas are detected as a single shadow or without considering the penumbra area. As a result, when the detected area was restored to the target, the penumbra area did not remove the shadow or the penumbra area was restored in the same way as in the umbra area, showing an unnatural restoration result.
(Figure 3) Shadow detection and removal problems in existing methods, (1) normal image, (2) only umbra region removed, (3)penumbra area treated in the same way as the umbra area, (4) shadow edge remains when removing shadow, (5) ground Truth
In Figure 3, we can see the existing problems. There are a blue object and a brown bottom. The black shadow is the umbra region and the gray shadow is the penumbra region. Looking at (1) in Figure 3, we can see that the umbra and penumbra exist when an object is illuminated. And the result we want is (5) with both umbra and penumbra erased. However, in the existing methods, the only umbra was removed, as shown in (2), considering the only umbra. The penumbra in (2) remains unerased. If the removal method in the umbra region is applied equally in the penumbra region, the penumbra region is still visible while being unnatural as (3). Even if the penumbra is a shadow that does not exist, there are cases where an edge occurs as shown in the (4) when the umbra area is erased, resulting in an unnatural effect.
2.2 Deep learning methods
Deep learning has recently demonstrated successful performance in many areas of computer vision and has therefore begun research to apply deep learning in numerous fields. Initially, simply attempting to segment using a network such as U-Net [3], but recently, studies using GAN have been actively conducted. In Kuo-Liang and Yi-Ru [4], shadow detection was performed using GAN, in Nguyen and Vicente [5] shadow removal was performed using CNN, and in Qu and Tian [6], there were studies using GAN that simultaneously performed shadow detection and removal.
The studies using the aforementioned deep learning used the supervised learning method, which essentially uses a lot of data to learn. There were many networks that performed well, but in practice, they had limitations from datasets. Table 1 shows the current public datasets. Looking at the datasets in Table 1, only very small datasets were initially used for deep learning, such as UIUC, LRSS, and UCF. This is not enough information to train realistically because it is a very small amount. Considering these factors, some large datasets such as SRD and SBU were established, but they were not datasets that could simultaneously learn detection and removal. SBU was composed of shadow and shadow-mask so that only detection networks could be learned, and SRD was composed of shadow and shadow-free images so that only removal networks could be learned. In quantity, it is much more than other datasets, so it is easy to learn each network, but it was not possible to do end-to-end learning.
(Table 1) Public shadow datasets
In general, end-to-end learning can lead to better learning because it learns about the results it removes based on the detected results. ISTD has built a dataset with all three types of data to enable end-to-end learning of detection and removal. In addition, it is still a little lacking to learn deep learning networks through about 2, 000 images, and it was built only with images that were filmed under limited conditions to obtain three kinds of shadow data at the same time.
In this paper, a method of easily generating shadow data that is difficult to collect is proposed. In particular, we aim to capture building and satellite images that do not exist in the currently released datasets. The method proposed in this paper is that building and satellite shadow data can be collected in bulk in a short time, and three types of data can be obtained in one shot. Using Unity3D, buildings existing , in reality, were placed in unity space, and shadows were created using lighting effects. Applying this method shows that not only building and satellite shadow data but also other shadow datasets can be easily collected.
3. Shadow data generation method
3.1 Implementation environment
In this paper, we developed using Unity3D 2018.4.0f1 LTS version, and Window Operating System and GPU used a single GTX 1080Ti.
3.2 Building virtual environment
The virtual shadow data set creation method was implemented by Unity3D. Unity3D is a tool or game engine that creates content such as 2D and 3D video games and architectural visualization. In Unity3D, 3D objects can be put in a virtual space, and 3D objects are imported and used through API provided by the Spatial Information Open Platform called Vworld. The 3D objects provided by Vworld include buildings that exist in reality and the ground corresponding to the buildings. In this study, 3D ground objects provided by Vworld were placed in the Unity space considering latitude and longitude. 3D building objects are placed at the location on the ground and attached with corresponding textures. Through this work, it was possible to place the actual ground and buildings in the Unity virtual space. However, because there is a limit to the amount of memory for uploading all objects, the ground and buildings within a certain range are created around the camera's location.
(Figure 4) (Left) Gyeongbokgung Palace taken from Google Maps (Right) Photo taken by positioning the Gyeongbokgung Palace area as a satellite in Unity3D
3.3 Shooting shadow data
We created a virtual camera to shoot 3D objects provided through the Vworld API in Unity space. The camera can move forward, backward, left, and right using the keyboard (W, A, S, D) in Unity space, and the terrain is newly arranged according to the camera position as it moves, and the ground and building objects out of range Are deleted. You can also rotate the camera if you use the right mouse button. It is also possible to zoom in and out of the camera using the mouse's drag and accelerated movement of the camera via the Shift key. All types of shadow images can be taken by pressing the "Capture" button, and the captured images are set to default width 400 and height 400.
Through the above manipulations, the user can place the camera wherever user wants and can take images of it. At this time, the closer it is to the building, the simpler shadow data such as the existing dataset can be captured, and if the camera is placed far away, the shadow data for the building intended for this study can be captured. In addition, you can shoot like a satellite image by looking at the ground while the camera is very far away from the ground in a vertical direction.
(Figure 5) images taken in Unity3D, (left) near, (middle) far away, (right) taken like satellites
3.4 Shadow data
There are three types of shadow data in the shooting data. Shadow image, shadow-free, and shadow-mask can all be obtained, and all kinds can be taken for the same scene at the same time, or only one kind can be obtained separately. When shooting, a scene in which shadow is created by using lighting in Unity space is saved as a shadow image, and can be saved as a shadow-mask by changing the texture of all objects to white. Shadow-free images can be obtained by changing the texture so that the effect of the lighting is not affected so that all objects display the original color.
(Figure 6) 3 types of the image taken from unity space, (left) shadow image, (middle) shadow mask, (right) shadow free
Note that the shadow-mask is stored in float units, not binary images. It is designed to consider penumbra as well as umbra in learning deep learning by storing in float units. The user needs to threshold with a specific value to obtain a binary mask. In the study, the threshold was set to 80 or 100 to use as a binary mask for shadows.
(Figure 7) (left) float shadow mask, (right) binary shadow mask, the threshold was set to 100
(Figure 8) Automatic shooting function 1, It shoots while rotating automatically around the building selected in the first image.
(Figure 9) Automatic shooting function 2, The fixed camera automatically shoots the image as the virtual sun on Unity rotates.
3.5 Automatic shooting function 1
In this study, we studied not only the building and satellite shadow data but also an automated shooting method. Depending on the shooting method, it is convenient to collect data while placing the camera at a location desired by the user, but it is still time-consuming. We studied how to build datasets faster by generating automated techniques to collect data. In the name of "Auto Moving", when a specific building is selected, it is a method of shooting data at an angle set by the user while rotating horizontally or vertically around the building. The camera was set to only look at the building while it is rotating, so it can't look at the sky or other places. If the camera rotates one wheel vertically, you can see the building from below, so it can only rotate up to 90 degrees, not one wheel. Using this automated technique, if you rotate horizontally by 30 degrees for a specific building, you can obtain a total of 12 data in one shot. By setting the angle to 30 degrees, this means that the data collection time is reduced by 12 times.
3.5 Automatic shooting function 2
Although the camera can acquire a large amount of data by rotating directly toward a specific building, there is a disadvantage that the density of the shadow is the same every time during the shooting because it was shot under the same brightness. To this end, this study also developed an automation technique based on lighting brightness. As mentioned earlier, there are real sun-based lighting and virtual sun-based lighting that can be operated by the user. Accordingly, there are automation techniques for each lighting. In the real sun-based lighting, the light moves according to the date and duration input by the user, and the direction and brightness of the shadow are changed. The virtual sun-based lighting is shot while rotating the X-axis based on a 360-degree angle for each angle input by the user. Through these two automation techniques, it is possible to shoot shadows of various directions and brightness for a specific building, and it is very fast and can collect various data.
4. Conclusion
In this paper, we proposed and developed a method that can take three types of shadow data to train a deep learning network to be applied to shadow detection and removal. After using Vworld API in virtual space in Unity3D to place 3D objects in the ground and buildings, in reality, virtual shadow data was captured using a camera, and shadow, shadow-free, shadow-mask could be obtained simultaneously by applying different methods. In addition, when the camera is located like a satellite, it is possible to shoot satellite images not the only buildings. Two automation techniques can be applied to shoot shadow data more quickly and diversely, and are 12 times and 24 times faster depending on the setting than normal shooting. It is possible to collect shadow data for buildings and satellites through the proposed method, but it should be understood that it is always a suboptimal solution because it is virtual. However, it made it possible to conduct research using deep learning in buildings and satellite shadow detection and removal where data did not exist and contributed in that it could be used for test purposes before real data.
Currently, although data for shadows at daytime exist, datasets for shadows at night do not exist. In addition, there are no shadow datasets in the evening red glow or in the blue sunlight at dawn. In the future, these points will be improved to make it possible to obtain data that can improve the generalization and robustness of deep learning networks.
Creating data that looks like reality is very important because it is data that is captured in a virtual environment that can cope with real shadows. However, the texture of the ground and the building is produced from images taken from satellites, so the resolution is very low, and not only the texture but also the material is not bumpy. Because of this, we will improve this to make it more realistic.
In terms of automation, currently, the camera rotates when the lighting is fixed, and the lighting changes when the camera is fixed. Advances with the technique, you can build dozens of shots in one shot, so you can build your dataset much faster than it is today. In the future, it will be developed as a technique that changes the brightness of the light and the position of the camera at the same time, so you can take dozens of shots and can build a dataset much faster.
☆ This work was supported by the GRRC program of Gyeonggi province. [GRRC KGU 2017-B04, Image/Network-based Intellectual Information Manufacturing Service Research]
☆ A preliminary version of this paper was presented at ICONI 2019.
References
- X Dong, K Wang, "Moving Object and Shadow Detection Based on RGB Color Space and Edge Ratio", 2nd International Congress on Image and Signal Processing, pp. 1-5, 2009. https://doi.org/10.1109/cisp.2009.5301770
- M Gryka, M Terry, G Brostow, "Learning to Remove Soft Shadows", ACM Transactions on Graphics, Vol. 34, No. 5, pp. 1-15, 2015. https://doi.org/10.1145/2732407
- R Guo, Q Dai, D Hoiem, "Paired Regions for Shadow Detection and Removal", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 12, pp. 2956-2967, 2013. https://doi.org/10.1109/tpami.2012.214
- K Chung, Y Lin , Y Huang , "Efficient Shadow Detection of Color Aerial Images Based on Successive Thresholding Scheme", IEEE Transactions on Geoscience and Remote Sensing, Vol. 47, No. 2, pp. 671-682, 2009. https://doi.org/10.1109/tgrs.2008.2004629
- V Nguyen, T Vicente, "Shadow Detection with Conditional Generative Adversarial Networks", IEEE International Conference on Computer Vision, pp. 4510-4518, 2017. https://doi.org/10.1109/iccv.2017.483
- L Qu, J Tian, S He, "DeshadowNet: A Multi-context Embedding Deep Network for Shadow Removal", IEEE Conference on Computer Vision and Pattern Recognition, pp. 4067-4075, 2017. https://doi.org/10.1109/cvpr.2017.248
- O Ronneberger, P Fischer, T Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation", In Lecture Notes in Computer Science Springer International Publishing, pp. 234-241, 2015. https://doi.org/10.1007/978-3-319-24574-4_28
- P Sarabandi, F Yamazaki, M Matsuoka, A Kiremidjian, "Shadow detection and radiometric restoration in satellite high resolution images", In IEEE International IEEE International IEEE International Geoscience and Remote Sensing Symposium, Vol. 6, pp. 3744-3747, 2004. https://doi.org/10.1109/igarss.2004.1369936
- T Vicente, L Hou, "Large-Scale Training of Shadow Detectors with Noisily-Annotated Shadow Examples", In Computer Vision European Conference on Computer Vision, pp. 816-832, 2016. https://doi.org/10.1007/978-3-319-46466-4_49
- J Wang, X Li, J Yang, "Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal", IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1788-1797, 2018. https://doi.org/10.1109/cvpr.2018.00192
- J Zhu, K Samuel, S Masood, M Tappen, "Learning to recognize shadows in monochromatic natural images", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 223-230, 2010. https://doi.org/10.1109/cvpr.2010.5540209
- M ji, J chun, "A Sketch-based 3D Object Retrieval Approach for Augmented Reality Models Using Deep Learning", Journal of Internet Computing and Services, Vol. 21, No. 1, pp. 33-43, 2020. https://doi.org/10.7472/JKSII.2020.21.1.33
- H Yoon, K Kim, J Chun, "GAN-based shadow removal using context information", Journal of Internet Computing and Services, Vol. 20, No. 6, pp. 29-36, 2019. https://doi.org/10.7472/JKSII.2019.20.6.29
- T Khanam, K Deb, "Baggage Recognition in Occluded Environment using Boosting Technique", KSII Transactions on Internet and Information Systems, Vol. 11, No. 11, pp 5436-5458, 2017. https://doi.org/10.3837/tiis.2017.11.014
- W Liu, J Hu, "Tongue Image Segmentation via Thresholding and Gray Projection", KSII Transactions on Internet and Information Systems, Vol 13, No. 2, pp. 945-961, 2019. https://doi.org./10.3837/tiis.2019.02.025