Disparity estimation using Graph cuts for road applications

This paper proposes a new edge based stereo matching approach for road applications. The new approach consists in matching the edge points extracted from the input stereo images using temporal constraints. At the current frame, we propose to estimate a disparity range for each image line based on the disparity map of its preceding one. The stereo images are divided into multiple parts according to the estimated disparity ranges. The optimal solution of each part is independently approximated via the state-of-the-art energy minimization approach Graph cuts. The disparity search space at each image part is very small compared to the global one, which improves the results and reduces the execution time. Furthermore, as a similarity criterion between corresponding edge points, we propose a new cost function based on the intensity, the gradient magnitude and gradient orientation. The proposed method has been tested on virtual stereo images, and it has been compared to a recently proposed method and the results are satisfactory.


Introduction
In the field of vehicle navigation, stereo vision has mainly been applied to a large variety of applications, such as obstacle detection and tracking [1,2], traffic sign detection and recognition [3][4][5][6], pedestrian detection and tracking [7], and so on. The key problem in stereo vision is the stereo matching problem, also called disparity estimation. By comparing information about a scene from two viewpoints, 3D information can be extracted by examining the relative positions of objects in the two stereo image [8,9,9,10]. Which allows having accurate and detailed 3D representation of the environment around the Intelligent Vehicle (IV). Example of disparity estimation Methods are available in [11][12][13][14][15]. A taxonomy of dense disparity estimation algorithms together with a testbed for quantitative evaluation of stereo algorithms is provided by Scharstein and Szeliski [16]. It was demonstrated from [16] that Graph cuts methods [17][18][19][20] produce good results. However, they are time consuming, In order to avoid this problem in this work, the edges of the stereo images are extracted to reconstruct the scene.
The present work is devoted to road applications. Consequently, the matching algorithm will be applied to each stereo images acquired at each time. Incorporating temporal information in stereo approaches can improve the results of the matching as mentioned in [21][22][23][24][25].
This work is an improvement of our recently proposed method [26]. Instead of using Dynamic Programming to match the edge points of the stereo images, we use the Graph Cuts algorithm to take into account the smoothness that should be between neighboring edge points. In addition, we improved the cost function used in [26] by considering the neighboring pixels of the edge points to be matched. The same idea in [26] is used to compute a disparity range for each image line. The stereo images are divided into multiple parts according to the estimated disparity ranges. Graph cuts algorithm is applied to match the edge points of each part independently. The estimated disparity ranges reduce the possible matches, which discard the false candidates and consequently improve the results. They also reduce the execution time of the applied energy minimization approach.

Related work
The use of sequence-frame stereo presents advantages over single-frame stereo. The temporal information may help the matching process to improve accuracy and to enforce temporal consistency of disparity maps in the case when the matching is ambiguous. In the literature, to improve the results of the stereo matching, temporal consistency has been used where different types of methods have been developed, e.g. optical flow, spatio-temporal and disparity prediction [27][28][29][30][31].
Optical flow methods extend the optical flow to 3D motion field and takes stereo and motion into account simultaneously [32]. Toa et al. [30] proposed a depth estimation method for non-rigid dynamic 3D scenes, in which the scene is represented by a collection of 3D piecewise planar surface patches based on color segmentation of input images. This representation is estimated by an incremental formulation. The spatial match measurement and the scene flow constraint [33,34] are employed in the matching process. The algorithm's execution time and the accuracy of the results are limited by the image segmentation algorithm used. Hung et al. [35] proposed a depth and image scene flow estimation method to preserve motion-depth temporal consistency. Zhang et al. [34] proposed a 3D scene flow computation method, in which 3D motion model is fit to each local image region and adaptive global smoothness regularization is applied to the whole image.
Spatio temporal methods extend the spatial window used in the cost function to a spatio temporal window, which takes the spatial information of stereo image pairs and temporal information between consecutive images into account simultaneously. Zhang et al. [31] use spatial and temporal information by extending the spatial window to a spatio temporal window, spatial window used to compute the sum of squared di↵erence (SSD) cost function, spatio temporal window to compute the sum of SSD (SSSD). Their method performs well in static scenes, but it fails to do so with dynamic scenes. Davis et al. [28] has developed a framework called space-time stereo similar to the one developed in [31] to recover shapes by studying the space-time windows, however, the method does not give good results in the dynamic scenes. Zhang et al. [36] presented a novel method for recovering depth maps from video sequence.
Disparity prediction methods use the results computed at the previous frame to compute the disparity map of the current frame. Jiang et al. A method to predict depth map between consecutive frames based on features detection, edge motion estimation and motion detection was proposed in [25,37]. Dobias et al. [38] presented a method to transfer the depth map of the previous frame already computed to the current frame using estimated motion of the calibrated stereo rig. Cech et al. [39] predicted pixels correspondences based on the motion of pixels to compute a disparity map as well as an optical flow map between consecutive frames. In [29] an algorithm has been developed to compute both disparity maps and disparity flow maps in an integrated process. The disparity map generated for the current frame used to predict the disparity map for the next frame.
Other methods do not use any of the above types. El Ansari et al. [22] presented a stereo method that use temporal information in the matching process. A disparity range is deduced based on both an association algorithm and the disparity of the previous frame. The new disparity range is integrated in DP to compute the current dispraity map. Mazoul et al. [24] match edges curves in the adjacent frames based on the same technique in [22]. The matched edge curves are then used to estimate a disparity ranges together with "matching control edge" points to dived the search space of the DP.

Proposed approach
In this section, we describe the main steps, summarized in Fig. 1, of the proposed stereo matching approach. We noted that the stereoscopic sensor used in our experiments is mounted aboard an IV and provides rectified images (i.e., corresponding pixels have the same ycoordinate). We start by mentioning the features to be matched by the proposed approach. Then, we present the constraints that the pairs of corresponding edge points should meet. Afterword, we present the proposed cost function. The last sub-section describes the Graph cuts algorithm used for the matching process.
The following notations are considered for the rest of the paper: • I L k and I R k denote the left and right stereo images acquired at time k, respectively. • D k and D k−1 denote the disparity maps computed at the current frame and its preceding one, respectively.

Edge extraction
The first step consists in extracting significant features from the stereo images to be matched.
In this work, we are interested in employing the edge points as features. The edges of the stereo images are extracted to reconstruct the scene in the goal to solve both the problem that large areas of pixels are similar and the time consuming of the dense reconstruction. To extract edges, we used canny edge operator [40] for the reason that it yields continuous edge curves which are vital to the proposed matching method and for its detection precision.

Disparity constraints
In order to discard false matches, we consider three constraints. The first one is the geometric constraint that defines the minimum disparity threshold, resulting from the sensor geometry, which assumes that a pair of edge points e L i and e R j appearing in the left and right image lines, respectively, represent possible match only if the constraint i and x R j are the x-coordinate of e L i and e R j , respectively. The second one, is the constraint of similarity between candidate pairs. Finally, the third one, is the maximum disparity threshold.

Disparity ranges estimation
In this work, we use the same idea of [26] to determine the disparity range for each image line, by analyzing the v-disparity of the disparity map computed at the previous frame. In the context of IV, the fps is very important. Therefore, the disparity values at the objects in the images will not undergo big variations. We can also say that the oblique line, representing the road in the v-disparity map of the current frame, will have its position very close to that appearing in the v-disparity map of the preceding frame. The v-disparity map of the previous frame will be divided into two parts. The top part containing objects, while the bottom part containing the road. For the top part, the maximum disparity is that of the closest object in the scene. Knowing the disparity d O k−1 of the closest object at the preceding frame, we can deduce that the disparity of this object at the current frame is less than d O k−1 + ∆d max , where ∆d max is maximum di↵erence of disparity possible between two adjacent frames. It is set to 4 in this work. The disparity range in this part is [0, d O k−1 + ∆d max + ↵] (see Fig. 2), where ↵ is an uncertainty value to select.
For the bottom part, the road is represented by an oblique line. We have only one possible disparity value for each image line. For the image line y i , the only possible disparity is (y i − b)/a, where a and b are the oblique line equation parameters. In order to take into account the uncertainty inherent to the estimation, the disparity range in this part is [( Fig. 2).
The line separating the two parts is L k 0 in which the disparity is equal to the disparity of the closest object, assuming the maximum displacement Fig. 2, illustrates how to get both disparity ranges from both top and bottom parts of the v-disparity map.

Image parts search
The two stereo images are divided into multiple parts according to the disparity ranges computed in the sub-section 3.2.1. In the top part of the v-disparity, we have only one disparity range that is [0, d O k−1 + ∆d max + ↵]. Thus, all the image lines belonging to this v-disparity part are considered as the first image part. However, in the bottom part of the v-disparity, we have a disparity range for each image line y i which is [( We divide this v-disparity part into several image parts. Each part contains a well-defined number of image lines. The disparity range of each part is defined as follows: • The minimum disparity value is (y i − b)/a − ↵, where y i is the line that has the lower index i in the part.
• The maximum disparity value is (y i − b)/a + ↵, where y i is the line that has the higher index i in the part.

Proposed cost function
One of the most important step of the stereo matching algorithms is the cost computation [42], which is crucial for quality of the disparity map. A given cost function is used to find similarities between corresponding features, which helps find the best candidate between the possible matches. In this work, as a similarity criterion between corresponding edge points, we propose a new cost function based on the one used in [26]. Which is defined based on the intensity, the gradient magnitude and the orientation at the edge points. Let's denote this cost function as C DIGO (Di↵erence of Intensity, Gradient and Orientation). Let e L i and e R i be two edge points belonging to two corresponding epipolar lines on the left and right images, C DIGO is defined as follows: The cost function we propose take into account the considered pixel and its neighborhoods. It is defined as follows: where d is the disparity, N are neighboring pixels of edge point e L i .

Graph cuts
As discussed in [18,43], a graph is composed of a set V of nodes and a set E of directed edges, each with a non-negative weight. A cut on a graph is partitioning the nodes in the graph into two disjoint subsets V S and V T such that the source S is in V S and the sink T is in V T . The cost of the cut is the sum of the weights of the edges between the two partitions V S and V T . A minimum cut of the graph is a cut with minimal cost. The minimum cut problem can be solved by finding a maximum flow from the source S to the sink T . In [18,43], the matching problem is considered as a minimization of an energy function (see equation 3), which is done based on graph cuts. The minimum cut corresponds to the minimum cost. The energy function of a configuration f is defined as follows: This energy has four terms. The data term measures how well matched pairs fit, the occlusion term minimizes the number of occluded pixels and the last one forces neighboring pixels in the same image to have similar disparities. The data term used in [18,43] imposes a penalty based on intensity di↵erences of corresponding pixels. In this work, the data term is based on the cost function proposed in the section 3.3. The accurate choice of the maximum and the minimum disparity thresholds for almost any known stereo matching method is crucial to the quality of the output disparity map and the computation time. Instead of applying the energy minimization algorithm on the global image where the disparity range is very large, we apply it on each image part searched in the section 3.2.2 independently. The disparity range at each image part is very small compared to the global disparity range which improves the results and reduces the execution time. The graph cuts algorithm was applied on the edge points extracted from the stereo images.

Experimental results
In this section, we discuss the experimental results obtained by the proposed stereo matching approach on virtual stereo sequences. The new method has been compared with a recently proposed method [26], with the method proposed in [24]and with the same Graph cuts algorithm used in the new method without integrating temporal information. Let us refer to the proposed method as "New method", the one proposed in [26] as "Method 1", the one proposed in [24] as "Method 2" and the Graph cuts algorithm without temporal information as "Method 3". The hardware used in our experiments is a Lenovo T420 Intel(R) Core(TM) i5-5220M CPU 2.50GHz running under Windows 8.
The method has been tested on the MARS/PRESCAN virtual stereo sequence [44]. The dataset contains a sequence of 512 ⇥ 512 stereo images and their ground truth. Fig. 4 illustrates the left stereo image #293 of the virtual sequence and its corresponding edge image. Fig. 5 depicts the disparity map computed by new method at the frame #293. We used false color to make the disparity map clear. The blue color represents the nearest 3D points and the red one represents the farthest 3D points.
The Table 1 shows the comparison between the results obtained by the four methods at the frame #293. It provides, for each method the number of correct matches (NCM), the percentage of the correct matches (PCM) and the execution time (ETime) for the frame #293.
From Table 1   (Comparison between the four methods at other frames is given in Table 2).

conclusion
In this paper, the authors presented a new fast spatio temporal stereo matching method devoted to road applications. The main idea consists in using the disparity map obtained in the previous frame in the computation of one at the current frame. The disparity map of the preceding frame is served to compute a disparity range for each image line. The stereo images are divided into multiple parts according to the estimated disparity ranges. The optimal solution of each part is independently approximated via the Graph cuts algorithm. The search space at each image part is very small compared to the global search space which improves the results and reduces the execution time. To choose the best candidate between the possible matches, we proposed a new cost function based on the intensity, the gradient magnitude and gradient orientation. The proposed method has been tested on virtual sequences and the results are satisfactory. For future work, we will use the matching results to tackle the problem of obstacle detection and tracking.