Acta Geodaetica et Cartographica Sinica ›› 2025, Vol. 54 ›› Issue (9): 1633-1646.doi: 10.11947/j.AGCS.2025.20230306

• Photogrammetry and Remote Sensing • Previous Articles     Next Articles

An intelligent 3D reconstruction framework via deep learning based multi-view image matching

Shunping JI1(), Jin LIU1,2(), Jian GAO1, Jianya GONG1   

  1. 1.School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
    2.School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
  • Received:2023-12-31 Revised:2025-08-07 Online:2025-10-10 Published:2025-10-10
  • Contact: Jin LIU E-mail:jishunping@whu.edu.cn;liujinwhu@whu.edu.cn
  • About author:JI Shunping (1979—), male, PhD, professor, majors in digital photogrammetry, computer vision, remote sensing image processing, and deep learning, etc. E-mail: jishunping@whu.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42030102);The National Natural Science Foundation of China(42171430)

Abstract:

The real-scene 3D model reconstruction of the ground surface based on high-resolution stereo or multi-view images is a key research topic in photogrammetry and computer vision, with dense image matching being the core technologies. At present, the mainstream 3D reconstruction algorithms are still based on the manual-designed methods. Although deep learning-based dense matching algorithms have shown excellent performance in recent years, they have not yet been deployed in 3D reconstruction projects, and there are few reports on the deployment of 3D reconstruction frameworks or software based on deep learning or intelligent methods, both domestically and internationally. To promote the application of modern artificial intelligence methods in large-scale 3D surface reconstruction task, this article proposes a general intelligent framework for real-scene 3D reconstruction called Deep3D, with the core component being a deep learning dense matching network. This framework includes complete processes of aerial triangulation, optimal view selection, deep learning-based dense matching, depth map fusion, and 3D surface model reconstruction, aiming for urban-level real-scene 3D surface reconstruction from multi-view remote sensing images. This general framework integrates the processing of aerial and satellite images by incorporating the perspective model and the rational polynomial coefficient model into the network, as well as the processing of binocular, multi-view and oblique view images by using adaptive multi-view alignment and aggregation strategies. This paper compares the Deep3D framework, software and open source solutions on two sets of oblique aerial images, and confirms that the proposed Deep3D framework performs essentially on par with or slightly better than software, far better than existing open source frameworks. This article also discusses the performance on satellite multi-view images of different methods. This study provides an outlook and reference for the application of deep learning methods in the real-scene 3D reconstruction projects.

Key words: 3D reconstruction framework, deep learning, image dense matching, remote sensing images, real-scene 3D model

CLC Number: