1. Introduction
Rapid technical improvement in the mobile device brings augmented reality (AR) techniques to smartphones, allowing users to interact virtually with their surroundings. AR that overlays interactively computer-generated information onto real world employs an optical system such as camera and a display device [1]. For mobile AR experience, many techniques in electronics engineering and computer science including tracking and registration, wearable input and interaction technology, and wireless networking are needed [2-5]. In registration process that aligns the virtual elements with the physical objects, previous studies dealt with the possibilities to make the augmented world as realistic as possible [6-14]. They tried to make a seamless integration of virtual objects into the real world.
Naemura et al. mentioned that the consistency of geometry, time (synchronized world to facilitate a smooth interaction), and illumination are important issues for AR application [15]. When the real and virtual objects are merged in realistic manner, environmental illumination information should be estimated to generate more realistic images [16,17]. Without shadow, virtual objects will appear like they are floating and thus will make the environment look unrealistic [14]. This paper focuses on generating a virtual shadow that will serve as an essential element to improve visual perception and realism in AR application. By reconstructing the illumination distribution of a scene, we can project realistic virtual shadows of a virtual object onto the real scene. Agusanto et al. introduced image-based lighting method to acquire environment illumination and to generate rendering images with global illumination rendering model. However, an environment map is required to be pre-filtered in order to generate rendering images through the surrounding light sources [6]. Havran et al. captured environment maps over a hemisphere with high dynamic range (HDR) camcorder to generate temporally coherent illumination [8]. Son et al. employed an omnidirectional camera to acquire illumination distribution including the directions and color intensities of the light sources [12]. These systems require additional camera equipment with an omnidirectional field of view. Barreira et al. proposed a sensor based method to estimate outdoor illumination for mobile AR. The method employs various measurements such as an ambient sensor, global positioning system (GPS), and a weather application programming interface (API) that is one of networking services [4]. Arief et al. introduced illumination direction estimation for mobile AR system [11]. The method can obtain only one direction of single strong light source because it extracts distinct contours from the shadowed region. Therefore, it cannot analyze multiple shadows generated by many light sources in real world.
Sato et al. described a method to recover an illumination distribution of a scene from image brightness inside shadows cast by an object of a known shape [9]. They solved a linear equation defined with pixel values, scene parameters, and unknown illumination radiance by using the linear least-square algorithm. The higher number of sample used, the higher quality of shadows will be generated. That means the method needs to select a sufficiently large number of image pixels. If we take the limited number of sampling points inside shadow areas to examine image brightness, the method would fail to provide a correct estimate of illumination distribution. To overcome the instability problem, Sato et al. partitioned a shadow surface around a reference point into clusters and selected one pixel from each of those clusters [11]. In addition, because Sato et al. did not consider the distance of the camera and the shadow surface, the number of considered light sources in the scene remained unchanged. In this case, uneven sample distribution is obtained regardless of the number of sample points. Uneven sample distribution makes numerical computing process unstable.
This paper presents two novel AR methods to estimate environment illumination distribution. By using an orientation sensor of a mobile device, we extract the lighting direction and intensity from input images captured with a front-side camera. The second method examines shadow areas cast by three dimensional (3D) AR marker of known shape and size. Every light source in the scene is positioned on a geodesic dome and a shadow surface by 3D AR marker is partitioned into clusters like Ref. 9. By using and logic operation between the binarized regions: the previously clustered shadow regions and the threshold real shadow regions, we can compute efficiently relative area proportions according to threshold values. The obtained illumination distribution is used to generate natural rendering images with multiple smooth shadows. The proposed system can provide correct-perception of user viewpoint.
2. Proposed Method
In an interactive service application, photo-realism means photometric registration and visual coherence between the real and the virtual elements. For seamless integration of a synthetic object within the image, the proposed method estimates the illumination distribution surrounding the virtual object and generates virtual shadows for AR application.
2.1 Using Orientation Sensor of Mobile Device
In order to generate correct lighting effects in a scene, the proposed system extracts the light information (its direction and intensity) from an input image by a front-side camera of a mobile device. Recently released mobile devices such as smartphones and tablet-PCs have various sensors, including GPS, gyroscope, and accelerometer, to determine their positions and viewing directions.
To capture widely distributed light sources, the proposed method takes snapshots of the surrounding areas like a ceiling with a front-side camera. A viewing direction of the mobile device is measured with an orientation sensor of a commercial tablet-PC. The device is required to be fixed to the reference position because its orientation parameter is measured according to an initial pivot point. Inaccurate initialization of the pivot point could cause errors in light direction estimation procedure.
By binarizing input images with the threshold value, we can obtain regions with high intensity values. The labeling algorithm is employed to detect connected regions with high intensity values in binarized digital images. After binarizing and labeling input images, we detect a bright region with enough size (width of 100~200 pixels and height of 50~100 pixels) and high intensity value. The detected lights and their incoming directions in a spherical coordinate system are used in mobile rendering module. Multiple smooth shadows are generated by using environment light sources in real-time. Fig. 1 shows the proposed mobile AR system. 3D synthetic space is constructed with a spherical coordinate system and its origin is positioned at AR marker’s center as shown in Fig. 2 Here the light source location is represented with its azimuth and elevation.
Fig. 1.Proposed algorithm.
Fig. 2.Light sources on virtual hemisphere [13].
The number of light sources is determined by considering computation performance of a mobile rendering engine. Since a light source in graphics pipeline is a zero-sized point, the detected light source regions are sampled using Poisson model to render natural shadows with smooth boundary.
The mobile system detects AR marker in an input image captured by a back-side camera using the NyARToolkit module [18], and constructs a model view matrix of the camera. Like ARToolKit, NyARToolKit provides both marker-based and natural feature tracking. However, the software has been optimized for ease of portability across different programming languages. The NyARToolkit runs on some virtual machine platform such as Java, and it has some derived projects for Flash and Android. The environment illumination is used as direction and intensity of the light source in rendering process. The proposed system employs a commercial tablet-PC to detect AR marker and to generate rendering image with OpenGLES. Frame buffer object (FBO) is used to generate multiple overlapping shadows.
2.2 Using 3D AR Marker and Its Shadow Regions
Shadows are caused by the occlusion of incoming light in a scene. In general, image brightness inside shadows has a great potential for providing distant clues to the environmental illumination distribution.
Arief et al. extracted the shadow contour to estimate illumination direction under the assumption that there is a single dominant illumination source in the scene [11]. Although salient shadow boundaries are present in a real image, the detected shadow regions are much dependent on threshold values as shown in Fig. 3 A virtual object generated based on the relationship between AR marker and a camera system is red-colored.
Fig. 3.Input and result images with threshold values(70 and 150)
Previous pixel based methods to examine shadow intensity are sensitive to the number of sampling points and their positions. In addition, it is difficult to determine an optimal threshold value for shadow detection [19]. These sampling and detection limitations deteriorate the system performance.
To estimate camera parameters (translation and rotation information) and extract shadow regions by multiple light sources, we employ 3D AR marker of known shape and size. Like Ref. 9 each of environmental illumination radiance is located at a centroid of each polygon of a geodesic dome. Illumination distribution of the scene is represented with a set of point sources equally distributed on the geodesic dome. We refer to the surface onto which the occluding object casts shadows as the shadow surface. The proposed method can obtain in advance shadow regions cast by both CG light sources and an occluding 3D CG object of a known shape. The part of shadow surface is partitioned into clusters based on the occlusion of light sources by an occluding object. In other words, the shadow surface by the same CG light source is partitioned into one cluster, called as shadow mask. The number of shadow masks is the same as that of CG light sources. The relationship between AR marker and camera coordinate system is determined by using ARToolkit.
In Fig. 3, as the threshold value increases gradually, larger shadow regions from the real image are obtained. By using binary logic operation (AND operation) between the previously clustered shadow region mask and the threshold real shadow region, we can compute efficiently a ratio of two areas according to threshold levels. The obtained relative proportions provide a robust criterion for determining if a shadow by each light source is present in the real shadow region. We define a function with relative area ratio information as Equation (1). In the i-th shadow mask, the proposed method obtains a proper threshold value t*i that maximizes the discrete difference of fi(t) in a threshold level step Δt as follows:
where fi(t) represents the i-th relative area proportion with the threshold value t. Δt is set to 10, and the minimal threshold value tmin and the maximal value tmax are set to 10 and 250, respectively. The proposed method examines that the magnitude of the discrete difference of fi(t) is more than a threshold value (experimentally set to 4). That means the method checks whether the current threshold proportion is more than four times the previous proportion or not. More specifically, an incremental slope of the discrete difference of fi(t) is more than 75.96 degree, which is over five sixths of 90 degree. The slope threshold is computed with the magnitude of the discrete difference per discrete difference step: tan-1(4 / 1). We regard 75 degree as very steep slope and we count on there is sufficient increment of the threshold real shadow region at the threshold level. This threshold value plays a role of sensitivity parameter in light sources estimation. In other words, if lower threshold value than 4 is used, relatively unimportant light sources may be obtained. By excluding the regions that are shadowed insufficiently in a final stage, we can remove unimportant light sources.
When the camera is far from the shadow surface, some cluster regions would have too small areas to measure their presence in the real shadow region precisely. The proposed method considers the distance of the camera and the shadow surface. In other words, we exclude cluster regions with too small area in illumination estimation. To improve system performance, we examine the visible area size of the obtained shadow cluster regions. By computing the area of the shadow cluster region, we exclude some cluster regions that are totally or much partially occluded by 3D AR marker. Secondly, the proposed method evaluates the average value of relative area proportions at each threshold value. Then we remove shadow cluster regions with the area ratio under the average value, meaning these regions are not shadowed sufficiently.
In rendering techniques, one area light source, such as an incandescent lamp, is generated with multiple point light sources. The detected light sources on polygons of the geodesic dome are sub-sampled to represent effectively an area light in real world. For reasonable selection of sample positions about the detected light sources, we examine three factors. The proposed method employs the relative area proportion and the threshold level when the light source is detected firstly. In addition, we examine the detection duration of the light source according to the increasing threshold values. In other words, important light sources are detected for a long time from a low threshold level to a high threshold and have large area proportions. The proposed method provides more weight values for important light sources, where more sample points are selected.
The proposed region based method can overcome the shadow detection problem by sampling noise and threshold value. Virtual shadows are rendered with the illumination distribution estimated from real scene.
3. Experimental Results
The experimental equipment includes a PC with 3.4GHz CPU and commercial tablet-PC (Samsung Galaxy Tab 10.1). The rendering performance depends on three factors: the number of sub-sampling per light source, FBO resolution and the number of light sources. Fig. 4 shows the generated rendering images according to parameter configuration. For further details, we compare the rendering results by one parameter while fixing two parameters. Fig. 4 (a) shows rendering images by 2~4 light sources, in which two parameters are fixed: the number of sub-samples per light source and FBO resolution are set to two and 1024×1024. For example, the third result image of Fig. 4 (a) is the rendering image by four light sources. Fig. 4 (b) shows rendering images when the number of sub-samples per light source is set to 2~4. Here the number of light sources and FBO resolution are fixed to two and 1024×1024. Rendering images in the two cases (Fig. 4 (a) and (b)) are generated at 15~20 frame per second (fps). Fig. 4 (c) shows the effects of FBO resolutions (256×256, 512×512 and 1024×1024). Here the number of light sources and that of sub-samples per light source are fixed to three all. Rendering images in the third case (Fig. 4 (c)) are generated at 13~14 fps.
Fig. 4.Comparison of rendering images.
Table 1 shows rendering performances depending on sub-sampling numbers and FBO resolutions in case of three light sources. Fig. 5 shows implementation on commercial tablet-PC. Fig. 6 shows our experimental setup for accuracy evaluation of the estimated light position (in green box). Two real light directions (in elevation angle θ and azimuth angle φ in spherical coordinate) according to the marker’s center are measured manually as ground truth. Table 2 shows that the estimated lighting directions are similar to the measured lighting directions of real area light sources.
Table 1.Rendering performances (fps).
Fig. 5.Implementation on commercial tablet-PC
Fig. 6.(a) Experimental setup and (b) detected light sources by front-side camera; Comparison of (c) captured image and (d) rendering image.
Table 2.Comparison of measured and estimated light directions.
In Fig. 6 (c) and (d), rendered shadows in the final image are almost the same as in the real picture captured by a commercial tablet-PC. As shown in Table 2, inaccurate initialization of the pivot point deteriorates orientation estimation performance.
In the second method, illumination distribution of the scene is represented with a set of point sources equally distributed on the geodesic dome. The environmental illumination distribution is covered with the geodesic dome of 80 triangles. Light sources of polygons both under and on the shadow surface cannot contribute shadows generation. So the proposed method excludes these polygons and considers 36 lights at centroids of polygons of the geodesic dome.
Fig. 7 shows an input image of 3D AR marker and initial setup datum: clustered shadow regions and mask image for shadow surfaces. The proposed method obtains shadow regions cast by both CG light sources and an occluding 3D AR marker of a known shape. In other words, the obtained shadow masks represents the occlusion areas by 3D AR marker. The number of shadow maps is the same as that of CG light sources. The shadow masks are represented with a unique color as Fig. 7 (b). The proposed method examines the area ratios between the detected shadows and the shadow masks by light sources of a geodesic dome. Here we can remove 3D AR marker area and the outer regions with a mask image, considering the relationship between AR marker and camera system. Fig. 7 (c) shows the mask image for shadow surfaces detection.
Fig. 7.(a) Input image; (b) clustered shadow regions; (c) mask image for shadow surfaces.
Fig. 8 shows the detected shadow region images by increasing threshold values with every 20 intensity interval, where CG object has red color. Fig. 9 shows the area ratio distribution of the previously clustered shadow regions and the threshold real shadow regions according to threshold levels. Here the threshold values are increased from 10 to 250 with every 10 intensity interval. To improve system performance, we examine the average value of relative area proportions at each threshold level. Then we remove shadow cluster regions with the area ratio under the average value as shown in Fig. 9 (c). That means these regions are not shadowed sufficiently. Then, we evaluate that the derivative magnitude of fi(t) is more than 4, representing that the current threshold area proportion is more than four times the previous proportion. After proper threshold values about the light sources are obtained, we exclude the regions that are shadowed insufficiently.
Fig. 8.(a) Result images with increasing threshold values (20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, and 240).
Fig. 9.(a) and (b) Relative area ratios between clustered shadow regions and threshold shadow regions; (c) cluster regions with sufficient real shadows; (d) detected light sources.
By computing the derivative of relative area proportions with respect to threshold values, we obtain four light sources of the geodesic dome as Fig. 9 (d). Considering a polygon index of the geodesic dome, the detected four light sources are adjacent to one another as shown in Fig. 10 (a). Here 9, 13, 14, and 24th light sources have blue, red, green, and yellow colors respectively. Here the detected light sources are transformed to improve reader’s visibility. To render more natural images, the proposed method generates 30 sub-sampling points among the detected four light sources, considering their relative importance weights. In Fig. 10 (b), the 9th light source (the brightest polygon) has 26 sub-samples and 13, 14, and 24th light sources have 2, 1, and 1 sub-samples respectively. In final rendering image (Fig. 10 (c)), the distribution of generated multiple shadows is similar to that of real image (Fig. 7 (a)).
Fig. 10.(a) and (b) Detected light sources on geodesic dome; (c) rendering image.
In order to demonstrate improved performance by the proposed method, we include experimental results by Sato’s method [9]. In Sato’s method, many contributions on illumination estimation have been well recognized. This method derived the linear formula relating the illumination radiance of the scene with the pixel values of the shadow image.
As shown in Fig. 11, there is one light source in input shadow image. Here red colored dots represent sample points for illumination radiance estimation. In the first case, the shadow surface is sampled randomly with 40 points and we exclude some points in 3D marker area. A small number of sample points does not guarantee sampling every direction of a geodesic dome. Sato’s method estimates illumination radiance based on the sampled pixel in the shadow region. In the first case, numerical instability of the linear formula increases because of no sampling points in some shadow regions. Shadow direction in final rendering image (Fig. 11 (c)) is different from that in input image (Fig. 11 (a)). It is important to select image pixels from a shadow image to maximize variation of patterns of occlusion of light sources by an occluding object. In the second case, the shadow surface is sampled regularly with 400 points. By dense sampling the shadow surface sufficiently, we can include pixel values about every illumination radiance. As shown in Fig. 11 (d), even though direction of the estimated light source is somewhat similar to that of input light source, the number of sample points assigned in the shadow area affects much the final estimation performance. That means the light source with the most sample points in the shadow area are likely to be chosen. Because Sato’s method based on point sampling is sensitive to the number of sampling points and their positions, this method sometimes failed to provide a correct estimate of illumination distribution.
Fig. 11.(a) and (b) Input images with sample points (40 and 400); (c) and (d) rendering images in two cases.
4. Conclusion
Rapid improvement in mobile computation performance helps many mobile AR methods to be released for realistic image rendering. This paper presents two methods to estimate environment illumination distribution of a scene. After capturing images with a front-side camera, we can detect light regions and their light directions using an orientation sensor of a commercial tablet-PC. In the first method, we demonstrate the effectiveness of using sensors (cameras and orientation sensors) mounted on a mobile device in estimating an illumination distribution of a scene. The second method examines area ratios between the detected shadows and the clustered regions by light sources of a geodesic dome. By computing the derivative of their relative area proportions according to threshold values, we can obtain suitable threshold value and precise light source positions. The estimated information is used to generate realistic images with multiple smooth shadows. The proposed area-based method can overcome point sampling problem and threshold value selection to extract shadowed regions. All mobile display devices have different rendering characteristics, and we will include the reproduction technique for more realistic rendering image according to changing environment illumination in our future research.
References
- R. Azuma, “A survey of augmented reality,” Presence: Teleoperators and Virtual Environments, vol. 6, no. 4, pp. 355–385, 1997. . https://doi.org/10.1162/pres.1997.6.4.355
- G. Reitmayr and T. W. Drummond, "Going out: robust model-based tracking for outdoor augmented reality," in Proc. of IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 109-118, October 22-25, 2006. .
- D. Schmalstieg and D. Wagner, "Experiences with handheld augmented reality," in Proc. of IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 3-15, November 14-17, 2007. .
- J. Barreira, L. Magalhaes, and M. Bessa, "A sensor based approach to outdoor illumination estimation for augmented reality applications on mobile devices," in Proc. of Eurographics, pp. 3-4, May 6-10, 2013. .
- T. Rossler, S. Rogge, and C. Hentschel, "A case study: Mobile augmented reality system for visualization of large buildings," in Proc. of IEEE International Conference on Consumer Electronics-Berlin, Berlin, Germany, pp. 311-314, September 6-8. 2011. .
- K. Agusanto, L. Li, Z. Chuangui, and N. W. Sing, "Photorealistic rendering for augmented reality using environment illumination," in Proc. of IEEE International Symposium on Mixed and Augmented Reality, pp. 208-216, October 7-10, 2003. .
- M. Haller, "Photorealism or/and non-photorealism in augmented reality," in Proc. of ACM SIGGRAPH International Conference on Virtual Reality Continuum and Its Applications in Industry, pp. 189-196, June 16-18, 2004. .
- V. Havran, M. Smyk, G. Krawczyk, K. Myszkowski, and H. Seidel, "Interactive system for dynamic scene lighting using captured video environment maps," in Proc. of Eurographics Symposium on Rendering, pp. 31-42, June 29- July 1, 2005. .
- I. Sato, Y. Sato, and K. Ikeuchi, “Illumination from shadows,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 290-300, 2003. . https://doi.org/10.1109/TPAMI.2003.1182093
- I. Sato, Y. Sato, and K. Ikeuchi, "Stability issues in recovering illumination distribution from brightness in shadows," in Proc. of IEEE Computer Vision and Pattern Recognition, vol. 2, pp. 400-407, December 8-14, 2001. .
- I. Arief, S. McCallum, and J. Y. Hardeberg, "Realtime estimation of illumination direction for augmented reality on mobile devices," in Proc. of IS&T Color and Imaging Conf., pp. 111-116, November 12-16, 2012. .
- W. Son, B. Nam, T. Kim, and H. Hong, "Using environment illumination information in mobile augmented reality," in Proc. of IEEE International Conference on Consumer Electronics, pp. 588-589, January 13-16, 2012. .
- Y. Jung, E. Choi, and H. Hong, "Using orientation sensor of smartphone to reconstruct environment lights in augmented reality," in Proc. of IEEE International Conference on Consumer Electronics, pp. 53-54, January 10-13, 2014. .
- Z. Noh and M. S. Sunar, “Soft shadow rendering based on real light source estimation in augmented reality,” Advances in Multimedia – An International Journal, vol. 1, no. 2, pp. 26-36, 2010. .
- T. Naemura, T. Nitta, A. Mimura, and H. Harashima, "Virtual shadows - enhanced interaction in mixed reality environment," in Proc. of IEEE Virtual Reality, pp.293-294, March 24-28, 2002. .
- A. Panagopoulos, T. F. Vicentre, and D. Samaras, "Illumination estimation from shadow borders," in Proc. of IEEE International Conf. on Computer Vision, pp. 798-805, November 6-13, 2011. .
- S. Ryu, S. Lee, S. Ahn, and J. Park, “Tangible video teleconference system using real-time image-based relighting,” IEEE Trans. on Consumer Electronics, vol. 55, no. 3, pp. 1162-1168, 2009. . https://doi.org/10.1109/TCE.2009.5277971
- http://nyatla.jp/nyartoolkit.
- X. Jiang, A. Schofield, and J. Wyatt, "Shadow detection based on colour segmentation and estimated illumination," in Proc. of British Machine Vision Conference, pp. 87.1-87.11, August 20-September 1, 2011. .