测绘学报 ›› 2024, Vol. 53 ›› Issue (2): 344-352.doi: 10.11947/j.AGCS.2024.20220607

• 摄影测量学与遥感 • 上一篇    下一篇

基于Swin Transformer-CNN的单目遥感影像高程估计方法及其在公路建设场景中的应用

廖钊宏1, 张依晨1, 杨飚2, 林明春3, 孙文博1, 高智1   

  1. 1. 武汉大学遥感信息工程学院, 湖北 武汉 430079;
    2. 广州市高速公路有限公司, 广东 广州 510320;
    3. 中交路桥建设有限公司, 北京 101107
  • 收稿日期:2022-10-25 修回日期:2023-06-26 发布日期:2024-03-08
  • 通讯作者: 高智 E-mail:gaozhinus@gmail.com
  • 作者简介:廖钊宏(2002-),女,硕士生,研究方向为计算机视觉与智能系统。E-mail:liaozhaohong@whu.edu.cn
  • 基金资助:
    国家重点研发计划(2020YFD1100203);湖北省自然科学基金(2021CFA088)

Monocular height estimation method of remote sensing image based on Swin Transformer-CNN and its application in highway road construction sites

LIAO Zhaohong1, ZHANG Yichen1, YANG Biao2, LIN Mingchun3, SUN Wenbo1, GAO Zhi1   

  1. 1. School of Remote Sensing Information Engineering, Wuhan University, Wuhan 430079, China;
    2. Guangzhou Expressway Co., Ltd., Guangzhou 510320, China;
    3. Road & Bridge International Co., Ltd., Beijing 101107, China
  • Received:2022-10-25 Revised:2023-06-26 Published:2024-03-08
  • Supported by:
    The National Key Research and Development Program of China (No. 2020YFD1100203); The Hubei Natural Science Foundation of China (No. 2021CFA088)

摘要: 目前,在遥感影像几何条件和辐射质量良好的情况下,通过多视遥感影像的逐像素立体密集匹配对场景进行高程估计的技术相对比较成熟,无论是精度还是效率均达到了较高水平。然而,当具有良好几何条件和辐射质量的多视遥感影像难以获取时,经典摄影测量和计算机视觉的几何处理方法可能会面临较大的挑战。本文对该问题进行了研究,针对大幅遥感图像中各部分高程分布差异大,模型训练难度大的问题,提出了一种基于Swin Transformer和卷积神经网络(convolutional neural network,CNN)的单目遥感影像高程估计方法。一方面Swin Transformer利用滑动窗口和层级设计,兼具了卷积神经网络处理大尺寸图像和提取多尺度特征的能力及Transformer的全局信息交互能力。另一方面针对大幅遥感图像中各部分高程分布差异大带来的训练不稳定问题,本文方法能针对每张输入图像自适应地划分高程值,将高程估计问题转化为分类-回归问题,最终图像各像素点的高程值由划分的高程值及其分布概率得到。试验结果表明:本文所提出的基于Swin Transformer-CNN的遥感影像高程估计方法无论是定性还是定量的结果都取得了很好的效果,且能应用于公路建设施工场景中,具有良好的泛化性。

关键词: 遥感影像智能解译, 深度学习, 单目高程预测, 全局信息, 卷积神经网络

Abstract: At present, under the good condition of image geometry and radiation quality, the technology of 3D scene reconstruction by intensive matching of multi-view aerospace image is relatively mature, which has achieved good results both in accuracy and efficiency. However, when multi-view aerospace images with good geometric conditions are difficult to obtain, the geometric processing methods of classical photogrammetry and computer vision may face great challenges. In this paper, we study this problem and propose a monocular height estimation method of remote sensing image based on Swin Transformer and convolutional neural network (CNN). Swin Transformer is a hierarchical transformer structure with shifted windows. It combines the ability of convolutional neural network to process large scale image and extract multi-scale features, as well as the global information interaction ability of transformer. In addition, our method reformulates the height estimation problem into a classification-regression problem to improve model performance. Specifically, for each input image, our model classifies the height range into several discrete bins adaptively, where continuous height value is estimated via a linear combination of predicted discrete bins and height distribution probability. In experiments, we qualitatively and quantitatively demonstrate that the proposed method outperforms the state-of-the-art approaches, and it can also be applied to highway road construction sites with good generalization.

Key words: intelligent interpretation of remote sensing image, deep learning, monocular height estimation, global information, convolutional neural networks

中图分类号: