测绘学报 ›› 2024, Vol. 53 ›› Issue (12): 2391-2403.doi: 10.11947/j.AGCS.2024.20230579

• 摄影测量学与遥感 • 上一篇    

深度和法线联合估计的深度学习多视密集匹配方法

刘瑾1,2(), 季顺平1()   

  1. 1.武汉大学遥感信息工程学院,湖北 武汉 430079
    2.杭州电子科技大学通信工程学院,浙江 杭州 310018
  • 收稿日期:2024-01-04 发布日期:2025-01-06
  • 通讯作者: 季顺平 E-mail:liujinwhu@whu.edu.cn;jishunping@whu.edu.cn
  • 作者简介:刘瑾(1996—),女,博士,研究方向为多视图立体视觉、影像密集匹配、三维重建。E-mail:liujinwhu@whu.edu.cn
  • 基金资助:
    国家自然科学基金(42171430)

Deep learning based multi-view dense matching with joint depth and surface normal estimation

Jin LIU1,2(), Shunping JI1()   

  1. 1.School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
    2.School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
  • Received:2024-01-04 Published:2025-01-06
  • Contact: Shunping JI E-mail:liujinwhu@whu.edu.cn;jishunping@whu.edu.cn
  • About author:LIU Jin (1996—), female, PhD, majors in multi-view stereo, dense image matching and 3D reconstruction. E-mail: liujinwhu@whu.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42171430)

摘要:

近年来,基于深度学习的多视密集匹配方法在三维重建任务上展现出了巨大的潜力,但是这些方法在恢复场景的几何结构细节方面仍显不足。在一些传统的多视密集匹配方法中,法线常被作为一种重要的几何约束来辅助推理精细的深度图,然而,这种蕴含了场景几何结构特征的法线信息在深度学习方法中却并未得到充分利用。本文针对多视密集匹配和三维场景重建任务,提出了一种基于深度学习的深度和法线联合估计方法。该方法采用多阶段金字塔的网络结构,从多视影像中同时推理深度和表面法线,并促进二者的联合优化。所提出的网络由特征提取模块、法线辅助的深度估计模块、深度辅助的法线估计模块及深度-法线联合优化模块组成。其中,深度估计模块通过整合表面法线信息构建具有几何感知的代价体,从而推理精细的深度图;法线估计模块利用深度约束构建局部代价体来推理精细的法线信息;联合优化模块则用于增强深度和法线估计结果之间的几何一致性。在WHU-OMVS数据集上的试验结果表明,本文方法在深度推理和表面法线推理任务上都表现出色,并显著优于现有方法。在两种不同数据集上的三维重建结果进一步表明,本文方法能够有效地恢复局部高曲率区域和全局平面区域的几何结构,有助于获得具有良好结构的高质量三维场景模型。

关键词: 深度估计, 法线估计, 多视密集匹配, 深度学习, 三维重建

Abstract:

In recent years, deep learning-based multi-view stereo matching methods have demonstrated significant potential in 3D reconstruction tasks. However, they still exhibit limitations in recovering fine geometric details of scenes. In some traditional multi-view stereo matching methods, surface normal often serves as a crucial geometric constraint to assist in finer depth inference. Nevertheless, the surface normal information, which encapsulates the geometric information of the scene, has not been fully utilized in modern learning-based methods. This paper introduces a deep learning-based joint depth and surface normal estimation method for multi-view dense matching and 3D scene reconstruction task. The proposed method employs a multi-stage pyramid structure to simultaneously infer depth and surface normal from multi-view images and promote their joint optimization. It consists of a feature extraction module, a normal-assisted depth estimation module, a depth-assisted normal estimation module, and a depth-normal joint optimization module. Specifically, the depth estimation module constructs a geometry-aware cost volume by integrating surface normal information for fine depth estimation. The normal estimation module utilizes depth constraints to build a local cost volume for inferring fine-grained normal maps. The joint optimization module further enhances the geometric consistency between depth and normal estimation. Experimental results on the WHU-OMVS dataset demonstrate that the proposed method performs exceptionally well in both depth and surface normal estimation, outperforming existing methods. Furthermore, the 3D reconstruction results on two different datasets indicate that the proposed method effectively recovers the geometric structures of both local high-curvature areas and global planar regions, contributing to well-structured and high-quality 3D scene models.

Key words: depth estimation, normal estimation, multi-view dense matching, deep learning, 3D reconstruction

中图分类号: