LVG-SfM: Learning-based view-graph generation for robust on-the-fly SfM

Wentian Gan¹, Yifei Yu¹, Giulio Perda², Luca Morelli², Rui Xia¹ , Zongqian Zhan¹, Xin Wang^1*, Fabio Remondino²

1 Wuhan University, Wuhan, China
2 3D Optical Metrology (3DOM) unit, Bruno Kessler Foundation (FBK), Trento, Italy

Most of the SfM methods operate offline, whereas the demand for real-time applications (such as quick disaster response, online measurements, collaborative 3D mapping, etc.) is increasing. Therefore, many researchers investigated online (or real-time) SfM solutions that aim to solve camera poses and sparse point cloud at speeds comparable to the image capturing rate.
Supported by recent advancements in learning-based feature extraction, matching and outlier detection methods, a more robust view-graph can be constructed, significantly enhancing the performances of online SfM. The paper presents a new real-time SfM solution, named LVG-SfM, which integrates and offers three operative processes:

Learning-based image correspondences generation: we leverage on learning-based features to extract and match sufficient and robust correspondences even in case of poor textures, such as SuperPoint, DISK, ALIKE, ALIKED, SuperGlue and LightGlue.
Learning-based view-graph robustification for ambiguous edges elimination: we leverage on Doppelgangers to further prune, after the two-view geometric verification, a view-graph by eliminating ambiguous edges due to repetitive structures.
LVG-SfM: the proposed method builds upon on-the-flySfMv2 to offer an advanced and robust real-time multi-agent SfM pipeline able to tackle ambiguous image sequences with repetitive structures and poor texture scenarios.

Paper
We leverage on local feature extractors like SuperPoint, DISK and ALIKED, and feature matching methods like SuperGlue and LightGlue to extract sufficient and robust correspondences. Then, for each new image, we apply the original retrieval module of on-the-fly SfM with a pre-trained global feature extractor and HNSW, selecting up to 30 of the most similar image pairs. We also leverage on Doppelgangers to differentiate true overlapping image pairs from ambiguous ones. If the number of remaining pairs after disambiguation is above two, the newly captured image is solved with the pipeline of on-the-fly SfM and added into the "registered image" sets, otherwise, it is inserted into the "not registered image" set. Finaly, the different agents involved in the surveying operation can simultaneously work on separate parts of the scene, leading to nonoverlapping image subsets. For each of these subsets, a distinct submap is created and updated in parallel. When the number of common/overlapping images between different submaps reaches a threshold, submaps are merged using the solution described in on-the-fly SfM.

Performance on poor texture sequences

Here shows the reconstruction results on poor texture datasets with various learning-based methods and quantitative Camera poses and view-graph reconstruction results of datasets with repetitive structures. Blunders are highlighted in circles.

Performance of disambiguation on repetitive structures

Camera poses and view-graph reconstruction results of datasets with repetitive structures are showed below. Blunders are highlighted in circles.

Our method has been proven to have significant practical effects. This work, titled "Exploring the potential of collaborative UAV 3D mapping in Kenyan savanna for wildlife research," utilizes the system we have developed in the research theme of enhancing the capabilities of wildlife conservation and ecological monitoring.

If you have any questions or advice, you can contact us through following address:

xwang@sgg.whu.edu.cn, Xin Wang, WuHan University
zqzhan@sgg.whu.edu.cn, Zongqian Zhan, WuHan University
yfyu2020@whu.edu.cn, YiFei Yu, WuHan University
xiarui@whu.edu.cn, Rui Xia, WuHan University
gwt2019@whu.edu.cn, Wentian Gan, WuHan University

Thanks for your support!