Acta Geodaetica et Cartographica Sinica ›› 2024, Vol. 53 ›› Issue (12): 2391-2403.doi: 10.11947/j.AGCS.2024.20230579

• Photogrammetry and Remote Sensing • Previous Articles    

Deep learning based multi-view dense matching with joint depth and surface normal estimation

Jin LIU1,2(), Shunping JI1()   

  1. 1.School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
    2.School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
  • Received:2024-01-04 Published:2025-01-06
  • Contact: Shunping JI E-mail:liujinwhu@whu.edu.cn;jishunping@whu.edu.cn
  • About author:LIU Jin (1996—), female, PhD, majors in multi-view stereo, dense image matching and 3D reconstruction. E-mail: liujinwhu@whu.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42171430)

Abstract:

In recent years, deep learning-based multi-view stereo matching methods have demonstrated significant potential in 3D reconstruction tasks. However, they still exhibit limitations in recovering fine geometric details of scenes. In some traditional multi-view stereo matching methods, surface normal often serves as a crucial geometric constraint to assist in finer depth inference. Nevertheless, the surface normal information, which encapsulates the geometric information of the scene, has not been fully utilized in modern learning-based methods. This paper introduces a deep learning-based joint depth and surface normal estimation method for multi-view dense matching and 3D scene reconstruction task. The proposed method employs a multi-stage pyramid structure to simultaneously infer depth and surface normal from multi-view images and promote their joint optimization. It consists of a feature extraction module, a normal-assisted depth estimation module, a depth-assisted normal estimation module, and a depth-normal joint optimization module. Specifically, the depth estimation module constructs a geometry-aware cost volume by integrating surface normal information for fine depth estimation. The normal estimation module utilizes depth constraints to build a local cost volume for inferring fine-grained normal maps. The joint optimization module further enhances the geometric consistency between depth and normal estimation. Experimental results on the WHU-OMVS dataset demonstrate that the proposed method performs exceptionally well in both depth and surface normal estimation, outperforming existing methods. Furthermore, the 3D reconstruction results on two different datasets indicate that the proposed method effectively recovers the geometric structures of both local high-curvature areas and global planar regions, contributing to well-structured and high-quality 3D scene models.

Key words: depth estimation, normal estimation, multi-view dense matching, deep learning, 3D reconstruction

CLC Number: