1. Introduction
Image completion methods are used to complete target areas (i.e., “holes”) in digital images. From a computational perspective, this is a difficult problem, as the completed image must be plausible and consist of the appropriate shapes and textures. Existing image completion methods may be roughly divided into two main categories: diffusion-based and exemplar-based.
Diffusion-based [1] methods complete missing image regions using thermal diffusion equations that propagate image message from surrounding areas into the damaged areas. These approaches include Euler’s elastic model [2], [3] as well as the Total Variation model [4]. They perform well for small target areas; however, they are prone to blurring when the damaged region is larger.
Exemplar-based [5] methods have been proposed for filling in large damaged regions. The methods in [6] process images in a greedy fashion; many globally optimized approaches have been proposed to address this problem. Some methods [7] [8] have been proposed to improve the effect of image completion using a sample image. Komodakis et al. [9] employed an optimization method that yielded more coherent results by using a well-defined objective function that corresponded to the energy of the Markov Random Field (MRF). This method is computationally expensive; however, a fast PatchMatch method [10] largely solves this problem. As patch translation is hard to obtain for the full structure of an image, some methods [11] [12] [13] have adopted photometric as well as geometric transformations to address this issue. He et al. [14] filled the target using regular structures that were obtained by statistical patch offsets. However, target regions that have multiple planes or are non-façade still pose a problem for image completion.
Here, we present an innovative method for completing target regions in images that have multiple planes or are non-facade. We discover that repetitive regularities can be extracted within the plane using planar attributes, which are defined as the prior probabilities of patch transforms and offsets. Furthermore, to improve the global optimization problem associated with image completion, we propose a new optimization scheme, termed planar-priority belief propagation (PPBP), using belief propagation (BP) [15] based on planar priorities.
The three main features of the proposed method can be summarized as follows. First, we extract effective planar information and repetitive regularities to guide image completion. Second, we define an objective function using planar guide knowledge and optimize it using PPBP, a new optimization scheme that utilizes belief propagation based on planar priorities. Third, we show that that damaged images with multiple planes or that are non-facade are completed seamlessly and plausibly.
2. Image Analysis
In order to analyze the regularities of the known image regions, we extract planar information in addition to repetitive regularities from images. Our image completion method benefits from this analysis as it provides enhanced mid-level information.
2.1 Planar Information
To identify and rectify perspective-distorted planes, we use a method known as vanishing point estimation, which includes line segment extraction and vanishing point groupings. Here, we briefly introduce this method. First, edges are detected and line segments fitted to the known image areas. Next three vanishing points (VPs) [16] are detected using a RANSAC-based voting algorithm. We assume the image has at most three different plane orientations. This assumption suits most situations including man-made scenes. For the example in Fig. 1, three vanishing points were detected, corresponding to the red, blue, and green line segments.
Fig. 1.Vanishing point detection
To describe three plane orientations, each pair of three VPs denotes one orientation. We use vanishing line to express the parameters of plane k as follow:
where has two degrees and is homogeneous. We can affine rectify a plane of the perspective image, making lines still parallel in 2D from parallel lines in 3D, with a perspective transformation matrix
To automatically reconstruct the single view-point of man-made scenes, we use a simple, direct method. Noting that two sets of parallel lines can compose a plane, we reasoned that two sets of unique VPs could be within the same plane. Therefore, when two VPs overlap, a plane can be identified by the location of the two sets of line segments.
The process consists of three steps. First, the spatial support of every VP is estimated by spreading its corresponding line segments using a Gaussian kernel. Second, the spatial support of the planes is estimated using element-wise multiplication that follows the support line-density maps of the VP. The density map is high at locations for which two groups of line segments overlap. We set density values uniformly to 10-5 across the image and as the parameters of the fronto-parallel plane. Third, we normalize each pixel’s probability in the density map so that the sum of the plane index probabilities equals 1. Pb[k|x] denotes the posterior probability of pixel x for assigning plane index k. The distributions of posterior probabilities are shown in Fig. 2. Plane localization is detected by the support line segments.
Fig. 2.The distributions of posterior probabilities of the plane
Owing to the fact that the lines running through the hole in the image cannot be detected by this method, we set the probability of each missing pixel to equal the probability of the boundary pixel.
2.2 Repetitive Regularity Extraction
Repetitive regularity exists widely in images of natural and man-made scenes. Regularity can help algorithms to understand more about scene structure. As in the methods of He et al. [14] and Huang et al. [13], we discovered translational repetitive regularities with offsets for matched features in images. Taking the factor of multiple foreshortened planes into consideration, we discovered local repetitive regularity in affine rectified space.
First, for every feature point detected with standard Gaussian Difference, we compute the SIFT descriptors [17] in the known region. Features are extracted in the source image instead of in the rectified space in order to avoid image regions that are distorted due to slanted planes. The nearest neighbor distances of every feature are measured using a kd-tree. Feature distance threshold was set to 0.2 in the experiment. Feature matches are defined as distances that are below the threshold.
Second, all feature matches that have high posterior probability Pb[k|x] are extracted in every plane k. In other words, the feature matches are extracted if the product of the two posterior probabilities was greater than 0.5.
Owing to distortions in viewing perspective, repetitive structures that have equal distances in three dimensions do not necessarily sustain equal distances in two-dimensional image space. We address this distortion problem by affine-rectifying the feature matches' positions.
Next, translational duplications can be discovered analyzing the invariable displacement of the two rectified feature points. Clustering of displacement vectors typically was associated with more obvious repetitive regularities. A mean-shift algorithm is used to discover these regularities. We defined Dk = {di} as the set of the translational regularities, with di ∈ R2 representing the displacement vector.
After accounting for multiple planes, the detection method was able to discover different repetitive regularities in the places and could therefore be used more generally in images with different types of content.
3. Image Completion
After acquiring information about planes and regularities, our method uses this information to fill the target regions using planar belief propagation, which build on the method of Komodakis [9].
For each input image, we set T as the target region and S as the source region. Our object function is based on the discrete Markov Random Field [18]. The MRF nodes are the neighboring lattices of w × w labels that intersect the target region; the MRF edges Φ link these lattices using the four-neighborhood method, as shown in Fig. 3.
Fig. 3.Nodes and edges of the MRF
3.1 Objective Function
We defined the objective function as follows
note that Ω is the label sets of know pixel indices, is the label sets of missing pixel indices, label denotes a node's central position in denotes the corresponding label’s central position in Ω, denotes the central position of the corresponding label having maximum belief, and ki is the plane index of node at ti. Esingle and Epair are the single node and pair nodes terms, respectively.
The single node cost Esingle, for assigning the node p(ti) with label p(si), compute the similarity between node p(ti) and corresponding label p(si). To enhance the accuracy of computing the similarity between labels, we define a new function using effective planar information.
where λ1 = 1, λ2 = 10, and λ3 = 103 are the weighting parameters of color cost, plane attribute, orthogonal orientation, respectively.
3.2 Color Cost
We use the sum of the absolute values to compute the color cost between each node and label in RGB space.
where p(ti) is a w × w node centered at ti, q(si, ti, ki) is the label with geometric transformation around the center si. Using the plane parameter of plane ki and the plane direction of node location ti, we make corresponding geometric transformation for label q(si, ti, ki).
To enhance the effectiveness of the labels space, we use homography to sample labeles. Homography is determined by the parameters si, ti, ki. Mapping a label at ti, we calculate the transformation on the sample label at si. is homogenous of ti, is homogenous of si. Row vectors of Gki are g1, g2, g3. The transformed positions of node and label are computed as follow:
In the rectified space, position offset vector is from node to label. We express as
We build up the relationship with inverse matrix
To acquire the transformation parameters of label at si, jointing equation 8 and equation 9, we use a translation offset matrix of label at ti as follow:
where TSsi denotes the label's domain transformation at si.
3.3 Plane Attribute
We searched the maximum posterior probability Pb[ki|x] in order to assign plane index ki to pixel x. To facilitate the calculation, we compute the minimum value using the negative log-likelihood.
3.4 Orthogonal Orientation
In the most city scenes, there are many repetitive shapes that lie along the orientation of horizon and verticality, such as balconies of multilayer buildings. In these types of situations, it is difficult to maintain structural coherence when filling the target regions. Therefore, we assign the labels along one of the orthogonal orientations. We first compute the rotation angle such that the set of VP line segments maps to the horizontal axis. We define as the mapping for two VPs in the plane ki.
Next, we compute the cost of the orthogonal orientation using the following equation:
note that ψ(α) = min(|α|,c) is defined to make the cost no greater than a constant c = 0.02.
3.5 Planar Priority BP
The next step in our new method is to use belief propagation to optimize the energy object function. However, for such a large number of labels, the BP’s computational cost is unbearable. We developed a novel MRF optimization method PPBP that is able to solve this problem accurately. PPBP essentially consists of two elements: dynamic label cropping and planar priority-based message scheduling. Dynamic label cropping was proposed to reduce the computational cost by cropping irrelevant nodes. Planar priority-based message scheduling was designed to send (computationally) cheap messages between the nodes.
BP is an algorithm that searches a maximum a posterior probability estimate by iteratively computing a number of equations. In MRF graphs, repeated BP propagates local messages from each node to other nodes. This algorithm will converge when all the messages have stabilized, as shown in Fig. 4.
Fig. 4.Calculating messages in BP
A node p(ti) at ti send a message mti,tj(sj) to a neighboring node p(tj) at tj, when the node p(tj) at tj is assigned a label p(sj). Taking single and neighboring nodes into account, we define the message mti,tj(sj) as:
where L is the set of all labels, |L| is the number of L, Φ is the set of the MRF edges that link these labels with 4-neighborhood method, and a node p(tr) is one neighboring node of the node p(tj). We must compute the received messages of the node p(ti), before node p(ti) send a message to node p(tj) as shown in Fig. 4.
Therefore, the labels are eventually selected by the cooperation of all nodes. Each node knows which labels are appropriate. Then, for a node p(ti) in the MRF, the planar belief bti(si) denotes the probability of node p(tj)assigned a label p(sj). We computed the planar belief bti(si) using the following equation:
note that we must collect all messages of node p(tj) using the second term. Messages are sent from their four neighboring nodes, as shown in Fig. 5. After all messages have been estimated, every node chooses the label with the maximum planar belief.
Fig. 5.Calculating beliefs in BP
3.5.1 Planar Priority-based message scheduling
The large number of labels |L| is greater than many million for a common image.
Unfortunately, the computational cost of updating all messages is |L|2, which is intolerable. It is therefore critical that decreases the labels large number using beliefs.
We designed a message scheduling scheme using the planar priority of each node in the MRF. The key principle behind message scheduling is that the messages should be sent from the node with highest planar priority to its neighboring nodes.
PPBP consists of two main steps, forward and backward propagation. The goal of forward propagation is to crop labels. The node with the highest planar priority is visited first. In each round of forward propagation, we visit a node p one time and then mark it as “committed”. Then we crop the labels of node p and send cheap messages to neighboring nodes with the exclusion of committed nodes. Finally, we update the nodes that have received new messages. The process repeats until all of the nodes are committed.
The backward propagation step simply visits the nodes in the opposite direction and sends messages from each node. This message scheduling scheme is shown in Fig. 6. During forward propagation, the green nodes are committed and other nodes are uncommitted. The node p(ti) will first be committed as having the highest planar priority out of all of the blue nodes, and will transmit messages to other uncommitted nodes along the two red edges. During backward propagation, nodes send messages along the dashed lines.
Fig. 6.Message scheduling scheme
3.6 Assigning planar priorities to nodes
Assigning appropriate priorities to nodes is critical for our approach. Priority is computed using only the planar beliefs bti(si) that were defined in equation 15. We define the priority of node p(ti) as follow:
note that, to reduce many labels with a lower planar belief, we define a planar belief threshold bt. We define the relative planar beliefs of node The number of higher beliefs’ relative nodes of node p(ti) is defined as NB(ti) = |si ∈ L|.
3.6.1 Label cropping
The goal of label cropping is to reduce the number of labels that are impossible to assign to a node during forward propagation. After a node p is committed, labels are designated for cropping if their relative belief at node p is less than the crop belief threshold pt. The rest of the labels are considered active labels for node p. Actually we only crop labels of the nodes that have more active labels than a threshold nt, which is defined by the user. Therefore, for a node p to be committed, each label of the node is traversed in priority order and marked active until the number of active labels is greater than nt or no label's belief is greater than pt.
4. Experiments
Our approach was executed on a Windows 7 operating system with an Intel CoreTM i7-4700k 3.5 GHz processor adopting Matlab 2013a and Visual Studio 2013 as development environment. The patch size was 7 × 7. Depending on the difficulty of the damaged image, the nt was set in the range of [15, 60]. The planar belief threshold bt was set as bt = - Ecolor(si, ti) = -║q(si) - p(ti)║, where Ecolor(si, ti) is the mediocre sum of the absolute color cost between patches in RGB space. The crop belief threshold was set to pt = 2bt.
To demonstrate the efficiency as well as the superior of the proposed method, we compared our method with current advanced approaches, including Komodakis' method, Image Melding, and He's method. Fig. 7 demonstrates the consequences of running these approaches on six target images. Through human visual system, our method yielded complimentary consequences.
Fig. 7.Comparison of results obtained using different image completion methods.
The first damaged image had regular structure; it was a planar building facade with a few perspective distortions. Our method acquired plausible results using the effective plane constraints. The other methods could not maintain consistency of line structures using only translation patches, which are too short to synthesize this type of textures needed to plausibly complete the image.
The second damaged image included repetitive regularities. Our algorithm filled the target region using the recovered plane directions with repetitive structures as guides. The alternative methods failed to fill the large target region as they were unable to maintain global consistency.
By comparing the results of the different image completion methods, it is clear that our method is robust and capable of filling difficult images with multiple planes.
4.1 Image completion performance measured in PSNR
To obtain a successful consequence for the people is the purpose of image inpainting. One important test is visual inspection and another one is obtaining quantitative results using Peak Signal to Noise Ratio (PSNR). The PSNR comparison of six images in Fig. 7 is shown in Table 1.
Table 1.Image completion performance measured in PSNR
The objective of image completion is to complete an image in the most satisfactory manner. Two primary methods exist to measure the quality of the result. The first is visual inspection; the second is PSNR. As shown in Fig. 8, we compared the PSNR of six images. The results of Photoshop show a gain of 0.29 dB in mean performance over the baseline, Komodakis show a gain of 0.39 dB in mean performance over the baseline, whereas the mean performance of our method is 0.98 dB over the baseline. The PSNR comparison reveals that our approach performs more effectively than the other three methods.
Fig. 8.PSNR comparison.
Regarding visual inspection, our approach produces better results than do the other methods in terms of image coherence and consistency. In addition, using the PSNR comparison, we obtain a slightly higher PSNR value overall than do the other methods. Therefore, our approach performs better in terms of visual inspection and PSNR.
5. Conclusion
We have proposed a new method of image completion using BP based on planar priorities. To avoid disrupting the structures in images that have multiple planes or are non-facade, we filled the target regions using planar information from the prior probabilities of patch transforms and offsets. We employed PPBP to optimize the MRF using message-scheduling-based planar priority and dynamic label cropping. We evaluated our approach using many images and obtained promising, plausible results in all cases.
References
-
M. Bertalmio, G. Sapiro, V. Caselles and C. Ballester, “Image inpainting,” in
Proc. of the 27th annual conference on Computer graphics and interactive techniques , pp. 417-424, July, 2000. Article (CrossRef Link). -
T. F. Chan and J. Shen, “Nontexture inpainting by curvature-driven diffusions,”
Journal of Visual Communication and Image Representation , vol. 12, no. 4, pp. 436-449, 2001. Article (CrossRef Link). https://doi.org/10.1006/jvci.2001.0487 -
D. Tschumperlé, “Fast anisotropic smoothing of multi-valued images using curvature-preserving PDE's,”
International Journal of Computer Vision , vol. 68, no. 1, pp. 65-82, 2006. Article (CrossRef Link). https://doi.org/10.1007/s11263-006-5631-z -
J. Shen and T. Chan, “Mathematical models for local nontexture inpaintings,”
SIAM Journal on Applied Mathematics , vol. 62, no. 3, pp. 1019-1043, 2002. Article (CrossRef Link) . https://doi.org/10.1137/S0036139900368844 -
A. Efros and T. K. Leung, “Texture synthesis by non-parametric sampling,” in
Proc. of the Seventh IEEE International Conference on Computer Vision , vol. 2, pp. 1033-1038, 1999. Article (CrossRef Link) . -
A. Criminisi, P. Pérez and K. Toyama, “Region filling and object removal by exemplar-based image inpainting,”
Image Processing, IEEE Transactions on , vol. 13 no. 9, pp. 1200-1212, 2004. Article (CrossRef Link) . https://doi.org/10.1109/TIP.2004.833105 -
M. Xiao, G. Y. Li, L. Xie, Y. L. Tan and Y. H. Mao, “Contour-guided image completion using a sample image,”
Journal of Electronic Imaging , vol. 24, no. 2, pp. 023029-023029, 2015. Article (CrossRef Link) . https://doi.org/10.1117/1.JEI.24.2.023029 -
M. Xiao, G. Li, L. Xie, L. Peng, Y. J. Lv and Y. H. Mao, “Completion of Images of Historical Artifacts Based on Salient Shapes,”
Optik - International Journal for Light and Electron Optics , vol. 127, no. 1, pp. 396-400, 2016. Article (CrossRef Link) . https://doi.org/10.1016/j.ijleo.2015.10.002 -
N. Komodakis and G. Tziritas, “Image completion using efficient belief propagation via priority scheduling and dynamic pruning,”
Image Processing, IEEE Transactions on , vol. 16, no. 11, pp. 2649-2661, 2007. Article (CrossRef Link) . https://doi.org/10.1109/TIP.2007.906269 -
C. Barnes, E. Shechtman, A. Finkelstein and D. Goldman, “PatchMatch: A randomized correspondence algorithm for structural image editing,”
ACM Transactions on Graphics-TOG , vol. 28, no. 3, pp. 24, 2009. Article (CrossRef Link) . -
S. Darabi, E. Shechtman, C. Barnes, D. B. Goldman and P. Sen, “Image melding: combining inconsistent images using patch-based synthesis,”
ACM Transactions on Graphics-TOG , vol. 31, no. 4, pp. 82, 2012. Article (CrossRef Link) . -
A. Mansfield, M. Prasad, C. Rother, T. Sharp, P. Kohli and L. J. Van Gool, “Transforming Image Completion,” in
Proc. of the BMVC , pp. 1-11, August, 2011. Article (CrossRef Link) . -
J. Huang, J. Kopf, N. Ahuja and S. B. Kang, “Transformation guided image completion,”
IEEE International Conference on Computational Photography , pp. 1-9, April, 2013. Article (CrossRef Link) . -
K. He and J. Sun, “Statistics of patch offsets for image completion,”
Computer Vision–ECCV , pp. 16-29, 2012. Article (CrossRef Link). -
W. T. Freeman, E. C. Pasztor and O. T. Carmichael, “Learning low-level vision,”
International journal of computer vision , vol. 40 no. 1, pp. 25-47, 2000. Article (CrossRef Link) . https://doi.org/10.1023/A:1026501619075 -
R. Hartley and A. Zisserman, “Multiple view geometry in computer vision,”
Robotica , vol. 23 no. 2, pp. 271, 2005. Article (CrossRef Link) . -
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
International journal of computer vision , vol. 60, no. 2, pp. 91-110, 2004. Article (CrossRef Link) . https://doi.org/10.1023/B:VISI.0000029664.99615.94 -
P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient belief propagation for early vision,”
International journal of computer vision , vol. 70, no. 1, pp. 41-54, 2006. Article (CrossRef Link) . https://doi.org/10.1007/s11263-006-7899-4
Cited by
- A Novel Image Completion Algorithm Based on Planar Features vol.12, pp.8, 2016, https://doi.org/10.3837/tiis.2018.08.016
- Image completion via transformation and structural constraints vol.2020, pp.None, 2016, https://doi.org/10.1186/s13640-020-00533-3