Evaluation of Stereo Images Matching

. Image matching and finding correspondence between a stereo image pair is an essential task in digital photogrammetry and computer vision. Stereo images represent the same scene from two different perspectives, and therefore they typically contain a high degree of redundancy. This paper includes an evaluation of implementing manual as well as auto-match between a pair of images that acquired with an overlapped area. Particular target points are selected to be matched manually (22 target points). Auto-matching, based on feature-based matching (FBM) method, has been applied to these target points by using BRISK, FAST, Harris, and MinEigen algorithms. Auto matching is conducted with two main phases: extraction (detection and description) and matching features. The matching techniques used by the prevalent algorithms depend on local point (corner) features. Also, the performance of the algorithms is assessed according to the results obtained from various criteria, such as the number of auto-matched points and the target points that auto-matched. This study aims to determine and evaluate the total root mean square error (RMSE) by comparing coordinates of manual matched target points with those obtained from auto-matching by each of the algorithms. According to the experimental results, the BRISK algorithm gives the higher number of auto-matched points, which equals 2942, while the Harris algorithm gives 378 points representing the lowest number of auto-matched points. All target points are auto-matched with BRISK and FAST algorithms, while 3 and 9 target points only auto-matched with Harris and MinEigen algorithms, respectively. Total RMSE in its minimum value is given by FAST and manual match in the first image, it is 0.002651206 mm, and Harris and manual match provide the minimum value of total RMSE in the second image is 0.002399477 mm.


Introduction
Hobrough provided the first solution of the image matching problem in the late 1950s, though it was analogue in nature [1,2]. The term "matching" refers to the process of establishing a relationship between two or more data sets (e.g., images, maps, 3D shapes, etc.). Image matching, in particular, refers to the establishment of correspondences between two or more images [3,4] by analyzing the content, features, structure, relationship, texture, and grayscale value of images, and then comparing the similarity and consistency among them [5]. In order to create correspondences among images collection, where feature correspondences between two or more images are required, it is important to define a set of salient points in every image [6]. With the advancement of technology, image matching techniques have become increasingly important in a variety of applications, including military affairs [7], medicine [8], industry [9], license plate recognition [10], fingerprint recognition [11,12], face recognition [13], animal motion trajectory tracking system [14], and face tracking shooting system [15]. Image matching represents a principal aspect of many problems in computer vision, including motion tracking [16], object recognition and matching [17,18], 3D reconstruction [19], stereo correspondence [20], image classification and retrieval [21], and camera calibration [22].
The computer vision field has seen a rapid rise in recent past years, with the development of various techniques to perform particular tasks. One of these tasks is image matching [23]. Automatic matching methods are faster than manual matching methods (especially in case of large image blocks aero triangulation), and generally the carried-out accuracy is higher or comparable to that extracted from analytical instruments. On the other hand, applying procedures for detecting and removing outliers is crucial in order to achieve high accuracy due to the relatively high amount of mismatches that usually appear a large number of observations (redundancy principle) [1]. Generally, there are three basic matching methods: area-based matching method, feature-based matching method, and relational matching method [1,2]. In this study, stereo image matching will be accomplished manually and automatically. The auto-matching is concentrated and based on the feature-based matching (FBM) method with local point features using BRISK, FAST, Harris, and MinEigen algorithms. The total root mean square error (RMSE) will also be determined and evaluated by comparing manual and auto-match.

Image Matching Methods
Feature-Based Matching (FBM). The feature is defined as something that can be measured in an image. As a result, feature is a number or a set of numbers extracted from a digital image [24] . There are two types of image features (frame features): local features and global features [12]. Traditionally, local features aim to detecting and describing key points or interest regions [6] by a set of numerous feature vectors called local features [25]. While global features (such as color and texture) are used to describe an image as a whole and can be interpreted as a specific property of the image involving all pixels [6] and they use a single multidimensional feature vector called global features to represent the content of the entire image [25]. The features can be an edge, a corner, an endpoint, a line or a curve, etc. [26]. Unlike area-based matching (ABM), which matches grey values directly, featurebased methods (FBM) match extracted features such as points, edges, or regions [1]. Image features are useful attributes that can be extracted from images or regions within an image. Two examples of image features are the symmetry of a region of interest and the histogram of pixel values. Geometric descriptors like the orientation of regions of interest or symmetry are examples of high-level features, while histograms of pixel values are called low-level features or primitive characteristics [27]. The feature point-based matching method has become a mainstream method for image matching due to its simple and rapid computation, high matching accuracy, and insensitivity to grayscale, lighting, graphic distortion, and occlusion [28].
Area Based Matching (ABM). Grey values are the matching entities in area-based matching. Matching a single-pixel causes an issue of obscurity. As a result, the grey values of numerous neighboring pixels are used [1]. Fonseca and his colleagues were the first to develop area-based methods, which are also known as correlation methods or template matching [29]. The feature matching step and the matching part are combined in these methods. Matching is achieved without detecting a salient object in this method. For correspondence estimation, predefined windows or sometimes entire images are used [30]. The original or a little changed (enhanced) image data is used as a matrix of grey values in intensity-based matching. Least squares matching (LS matching or LSM) and Cross-correlation, also known as area-based matching, are the most common methods [2]. In contrast to feature-based methods, area-based methods typically use a much bigger template, which means they can tolerate more noise and scene changes. Image representation and similarity measurement are typically involved in area-based methods [31]. The original pixel values and unique similarity measurements are used in area-based matching methods to find the corresponding relationship between image pairs. Mutual information-based methods, Fourier-based methods, and Correlation-based methods are three types of area-based methods [32].
Relational. It uses relationships, such as geometric or other relationships, between features and combination of features (structures) [2]. The following table shows an overview of matching methods, their entities, and similarity measure

Image Matching Process
Different image matching algorithms have other principles, but the image matching mechanism is essentially the same, as shown in Fig.1 [5]. Firstly, preprocess the original image [5]; Secondly, from the preprocessed original image, extract the image matching information [5]; Feature detection and feature description are two independent steps in extracting local features. A feature detector aims to detect a set of interest regions (also known as key points), whereas the goal of a feature descriptor is to mathematically extract stable features for information that surround the determined regions or detected keypoint [25]. Feature detection is the operation of computing the abstraction of information of an image and making a local decision at each image point to see whether an image feature of a certain kind exists in that point [33]. Interest points are selected at distinct locations in the original image that have unique content, such as corners, blobs and so on, in this step, and this process must be robustly executed [34]. Feature detection is used to find interest point for further processing. These points are not mostly associated with physical structures, such as table corners.
Finding features that remain locally invariant can be detected even when rotated or scaled is the key to feature detection [35]. Detecting features is an important step in feature description; it finds points and regions to use as feature descriptors. Most detectors fall into one of two categories: corner detectors and region detectors [36]. The most common term for the tool that extracts features from an image is detector [37]. A descriptor is used to depict the information contained within the neighborhood of a local feature that has been detected [37]. In this stage, interest points should have unique identifiers that are independent of features scale and rotation, which are known as descriptors. Information about interest points is expressed by descriptors, which are vectors that containing information about the point and surrounding of the point [34]. Thirdly, make the image matching; in this step, the object image's descriptor vectors and the new input or origin image are compared, and the score of matching is determined based on the distance between the vectors [34]. Finally, output the matching result [5]. Figure 1. Image matching process [5].

Overview of Selected Matching Algorithms
Binary Robust Invariant Scalable Key Points (BRISK). The BRISK algorithm is a scale and rotation invariant feature point detection and description algorithm. It obtains the binary feature descriptor by constructing the local image's feature descriptor using the relationship of grayscale of random point pairs in the local image's vicinity [38]. In BRISK, keypoint detection is based on the scale: interest points are identified using the significance criterion by both the image size and scale [39]. The detect BRISK Features function returns a BRISK Points object, points. The object includes information about BRISK features detected in a 2D grayscale input image [40].

Features from Accelerated Segment Test (FAST).
Rosten and Drummond proposed it as an algorithm for identifying interest points in an image [41]. FAST corner detector detects candidate points by performing a segment test on each pixel in the image, starting with a 16-pixel (bresenham circle) measurement around the corner candidate pixel. If a set of adjacent pixels in a bresenham circle with a radius are all brighter than the candidate pixel intensity plus a threshold value, or all darker than the candidate pixel intensity minus the threshold value, the candidate pixel is considered a corner [6,25,42,43]. In the corner Points object, the point is returned by the detect FAST Features function. The object consists of information about feature points found in a 2D grayscale input image [44]. [45]. Corner detection is based on the measurement of each pixel's corners response by determining the change in intensity because of local integration window shifts in all directions, giving peaks in corners response to the corner pixels in the minimum eigenvalue algorithm (also called Kanade-Lucas-Tomasi (KLT)) [25]. The corner Points object, points, is returned by detecting MinEigen Features function; the object stores data about the feature points found in a 2D grayscale input image [45].

Minimum Eigenvalue Algorithm (MinEigen). This algorithm is developed by Shi and Tomasi
Harris. Instead of using shifted patches, Harris and Stephens (C. Harris and M. Stephens, 1988) improved Moravec's corner detector by considering the differential of the corner score concerning direction directly [41]. The Harris method uses discrete features of an image [39]. Harris uses a local self-correlation function to measure local changes in the image with patches shifted by a small amount in various directions, combining corner and edge detector [25]. Harris algorithm essentially satisfies the corner detection criteria, so it has a good impact and is commonly used. By analyzing the change in image gray, it extracts feature points using a certain threshold. The color image is converted to a grayscale image [46]. The detect Harris Features function returns the corner points object. The object stores information about the feature points found in a two-dimensional input image [47]. Table 2 shows the algorithm's functions with their detectors, feature type, and scale independent.

Methodology
Basically, this work will take place in three main phases: input phase, process phase, output, and analysis phase, as shown in Figure 2. In input phase, two images with overlapped area will acquire and then load to the supported used programs. In the second phase, the process phase, matching the pair of images will be performed manually to match specific target points by using LPS project management software and automatically based on the feature-based matching (FBM) method by matching local corner features in mat lab software. In auto-match, the images will be read and converted to grayscale images, then features including target points are going to be extracted (detected and described) and matched with BRISK, FAST, Harris, and MinEigen algorithms. In the third phase, the image coordinates of auto matched features and target points will be obtained by auto-match and manual match. The transformation from image coordinates to metric coordinates of target points is going to be implemented. Last but not least, total root means square error (RMSE) will be determined and evaluated.

Experimental Work and Results
iPhone 7 plus mobile phone camera (wide-angle camera) is used in this work. The technical characteristics of this camera are detailed in Table 3. Two overlapped images were acquired with overlapping areas of about 85%, as shown in Figure 3 in JPG format. Manual and auto matching are performed on these images to match particular target points that appeared in overlapped areas between both images. Twenty-two distributed target points are selected to be matched. Firstly, manual matching by using LPS project management software program, as shown in Figure 4, is applied. Image coordinates of the 22 target points are obtained and then transformed to metric coordinates using Eq. 1 and Eq. 2, respectively, as shown in Table 5. Secondly, auto-matching based on featurebased matching (FBM) method, local point (corner) features, is carried out on images using selected feature matching algorithms in the mat lab program. BRISK, FAST, Harris, and MinEigen algorithms are chosen. Auto matching is implemented by each of these algorithms, as shown in Figures 5 to 8. Initially the original images have loaded and read, then converted from RGB to grayscale (binary) images. After that the features in the reference image (first image) are detected and described by these algorithms. Eventually, the feature matching process is conducted. The number of auto-matched features for each algorithm is illustrated in Table 4. The auto-matched target points image coordinates of BRISK, FAST, Harris, MinEigen algorithms also converted to metric coordinates, by using Eq. 1 and Eq. 2, respectively, as illustrated in Tables 6 to 9. Total root mean square error (RMSE) is determined by comparing the photo coordinates of auto-matched target points of each algorithm with the manual matched photo coordinates of these points by using Eq. 3 and the results are shown in Table 10 and illustrated in Figures 9 and 10, respectively.
Where (X mm , Y mm ) are photo coordinates of points in mm, (Xp, Yp) are image coordinates of points in pixel, (Xc, Yc) is the center of image, Xc=Ncol/2 , Yc=Nrow/2, ws=width of sensor (in mm), hs=height of the sensor (in mm), Ncol = number of columns in image, and Nrow=number of rows in the image [48].
Where RMSE is Root Mean Square Error in mm, (X Mi , Y Mi ) are photo coordinates of manual matched target points in mm while( X Ai , Y Ai ) are photo coordinates of auto-matched target points in mm, and is the number of target points [49].  Figure 3. Original images.        Figure 9. Total RMSE for the first image. Figure 10. Total RMSE for the second image.

Discussion
With returning to Table 4, as shown in this table, the BRISK algorithm matches 2942-point (corner) features, while the number of auto-matched points by using FAST algorithm is 1073 points, and the number in Harris is 378 points. Finally, MinEigen algorithm matches 1036 points. The selected target points that matched manually are not all them have been auto-matched with all algorithms. However, BRISK and FAST algorithms give all target points (22 target points). Whereas in the Harris algorithm, only 3 points are auto-matched; these points are 10, 17, and 21, respectively The MinEigen algorithm gives 9 of target points which are 1, 2, 3, 6, 9, 10, 12, 17, and 21, respectively. The total RMSE, in mm, in both images for all algorithms compared with manual matching is illustrated in Table 10 and graphed in Fig. 9 and Fig. 10, respectively. BRISK, FAST, Harris, MinEigen auto-match algorithms and manual match in the first image give total RMSE equals to 0.004283209 mm, 0.002651206 mm, 0.002845174 mm, and 0.00530099 mm, respectively. While total RMSE in the second image by these algorithms and manual match is 0.004376962 mm, 0.003137473 mm, 0.002399477 mm, and 0.005741377 mm, respectively.

Conclusions
In this paper, the number of auto-matching features is assessed based on detection, description, and eventually matching local corner features using BRISK, FAST, Harris, and Minimum Eigen algorithms based on FBM. Also, the total RMSE is calculated and evaluated by comparison 22 target points that matched manually with each of these algorithms. The most significant number of automatched points is 2942 with BRISK, whereas Harris gives 378 auto-matched points representing the fewest number of auto-matched features, which means the BRISK algorithm detects, describes, and matches more features. As shown in photo coordinates tables, we can see that all target points are auto-matched using BRISK and FAST algorithms. While 9 target points are auto-matched with MinEigen algorithm, and only 3 of them are auto-matched with Harris. To calculate the total RMSE, firstly, the metric coordinates are determined for auto-matched and manual-matched target points, and then it has been calculated. It is noted that in the first image, the maximum RMSE is 0.00530099 with MinEigen algorithm and manual match, and the minimum RMSE is 0.002651206 with FAST algorithm and manual match. The maximum RMSE is 0.005741377with MinEigen algorithm and manual match in the second image, and the minimum RMSE is 0.002399477 with a Harris algorithm and a manual match. Generally, the MinEigen algorithm and manual match give the maximum RMSE in both images. The obtained results from matching stereo images can be utilized in digital photogrammetry to build and reconstruct a 3D model from 2D stereo images.