Efficient Feature Matching and Pose-graph Initialization for SfM

Wentian Gan; Zongqian Zhan; Xin Wang


Introduction


In the last few decades, Structure-from-Motion (SfM) has been intensively studied in the field of computer vision, photogrammetry. However, it is still challenging to deal with large-scale datasets (such as crowdsourced or UAV images), wherein the feature matching and pose-graph generation are the key limitations in terms of time efficiency.

Similar the goal of Barath et al. (2021), we propose a method to improve the feature matching and initial pose graph generation algorithms for SfM (pose graph consists of node as images and edge between two images that survive from two-view geometric verification). On the contrary to Barath et al. (2021), we improve the pose graph via using several orthogonal minimum spanning trees (MSTs) to build a more complete pose graph, then the pose graph generation begins two-view epipolar geometry based on the image pairs that are on orthogonal MSTs.

The overall workflow of our method is shown as Fig.1, the green parts denote our main works.


Methods


The goal of this work is to propose an solution that can significantly improve the time efficiency for generating pose graph for image dataset while maintaining the performance of subsequential SfM processing, i.e., our method can be expected to more time efficiently yield input for SfM.

First, to generate a more complete pose graph faster, based on image similarity degree, several orthogonal MSTs are built as an pre-built initialization of pose graph.



We propose to use multiple orthogonal MSTs to generate a robust pose graph and further improve the time efficiency (“orthogonal” means that any two MSTs do not share any single same edge). To obtain a MST, firstly, each image is considered as a root node of different trees in the graph G, and one edge (image pair) has two vertices of root node. Secondly, retrieve the edge (image pair) with the highest similarity degree from the estimated similarity matrix, if two vertices of the edge belong to different trees, then we merge the corresponding root nodes into one tree and add the edge into the MST that is to be generated, otherwise it continues to process the next image pair with second highest similarity degree. Then, repeat the second step until only one tree left in the graph G, which is the final MST. Finally, a set of edges T = {(i, j)|i, j ∈ G} that includes all images and has the lowest cost is found. After the first MST tree is built, all the edges in the MST are added to the excluded set. Then we can get multiple orthogonal MSTs by repeating the above steps several times.


Second, to verify unknow two-view geometry more time efficient, a heuristic A* algorithm is employed to find visible path between these two views, based on which the corresponding two-view geometry is estimated via propagation along the path.

Third, to speed up the time efficiency of feature matching, a hashing strategy based on epipolar lines (calculated from the estimated relative pose) is utilized to narrow down the potential candidate matchable points.



We use a method called epipolar hash based on the essential matrix estimated in the previous step (two-view geometry propagation) to find the possible points which lies nearby the epipolar lines projected on the source image. Due to the characteristic of epipolar geometry, we divide the polar plane into different bins according to angles. Each bin shares the same epipole and the span of bin’s angle is [0,π)/#bins number or [a,b]/#bins number. The one that hits the epipolar line is the candidate bin to be explored. Finally, the feature matching are performed via considering the features on the candidate bin.


Experiments

To validate the performance of the proposed method, we test several different image datasets. To evaluate the time efficiency, two popular package (Colmap and openMVG) and one state-of-the-art method Barath et al., (2021) are compared (section 4.2). Finally, the results of SfM are reported using the evaluation metric of mean track length, mean reprojection error and number of registered images.

Our method is tested on three image datasets of Knapitsch et al., (2017), named as Family, Lighthouse and Playground, the table below lists the information of image size and number of images and sample image of these three datasets.

Datasets Name Image Size Image Numbers Image Sample
Family 1920*1080 152
Lighthouse 2048*1080 200
Playground 1920*1080 307




Before testing our method, the first step is to compute the similarity degree matrix among images, which is normalized into (0, 1) and with dimensions of n × n, higher value indicates that the two corresponding images are more likely to overlap with each other. To do this, we use ResNet50 as backbone and GeM (Radenovićet al., 2018) as aggregation layer that is pre-trained on GLD-v1 (Noh et al., 2017) to extract global feature. After that, similarity degree is computed via cosine distance, and in our experiment image pairs with similarity over 0.5 are eligible for subsequential processing and 3 orthogonal MSTs are used to generate the starting pose graph.

Performance of time efficiency
Datasets Method Cost time(s)
extract feature pose graph
Family our method 128.25 106.14
Barath et al., (2021) 91.83 305.82
openMVG 273.99 139.13
Colmap 13.98(GPU) 381.36
Lighthouse our method 121.18 104.12
Barath et al., (2021) 120.00 149.47
openMVG 172.13 115.40
Colmap 15.96(GPU) 439.68
Playground our method 189.06 277.31
Barath et al., (2021) 180.56 391.10
openMVG 331.21 308.24
Colmap 31.86(GPU) 1465.32


We can find that our method is always the fast solution when generating pose graph and exhibits significantly reduction in processing time compared to Colmap. Colmap computes pose graph time slowest, primarily due to its utilization of BF-search (brute-force) matching strategy. The second fastest pipeline is the method of OpenMVG, which utilizes the Fast-Cascade-Hashing-L2 matching algorithm. While this approach results in a significant speed improvement, but it typically need a greater amount of memory (Cheng et al., 2014). As the feature extraction step in Colmap benefits from GPU acceleration, the total time for the initialization part was not included in the comparative analysis. Comparing to Barath et al., (2021), we are in general faster, this can be explained by the benefit of the initialization stage using orthogonal MSTs.

Performance of SfM

The performance of SfM relies on the quality of feature matching and two-view geometric results. Therefore, in this section, the popular framework - openMVG is employed as a SfM engine which take the output of our method and Barath et al. (2021) as input to estimate image poses and 3D sparse point cloud. Three evaluation metrics are computed:
Dataset Method MTL MRE(px) RIN
Family our method 7.53 0.50 152/152
Barath et al., (2021) 3.80 0.83 141/152
openMVG 7.02 0.67 152/152
Colmap 8.06 0.87 152/152
Lighthouse our method 5.41 0.61 198/200
Barath et al., (2021) 5.35 0.63 186/200
openMVG 5.21 0.90 196/200
Colmap 7.74 0.66 200/200
Playground our method 5.73 0.49 307/307
Barath et al., (2021) 5.67 0.68 307/307
openMVG 4.20 0.99 302/307
Colmap 6.03 0.65 307/307


It can be observed that the method proposed in this paper exhibits the highest reconstruction accuracy regarding MRE, while the pipeline based on Colmap always exhibit the longest MTL. In terms of the successfully registered number of images, the method proposed is just slightly inferior to Colmap. For colmap, the inherent feature extraction module is used which generate approximately 3000 more feature points per image, as a result, the corresponding MTL and RIN is always the best as more correspondences are found and more edges are likely to pass the two-view geometric verification. Compared to the original pose graph method Barath et al., (2021), our method shows superior results on all three metrics, which demonstrate the efficacy of the proposed integration of multiple orthogonal MSTs. The visual SfM reconstruction results are shown as below:

About us

If you have any questions or advice, you can contact us through following address: Thanks for your support!