Acta Geodaetica et Cartographica Sinica ›› 2024, Vol. 53 ›› Issue (2): 344-352.doi: 10.11947/j.AGCS.2024.20220607

• Photogrammetry and Remote Sensing • Previous Articles     Next Articles

Monocular height estimation method of remote sensing image based on Swin Transformer-CNN and its application in highway road construction sites

LIAO Zhaohong1, ZHANG Yichen1, YANG Biao2, LIN Mingchun3, SUN Wenbo1, GAO Zhi1   

  1. 1. School of Remote Sensing Information Engineering, Wuhan University, Wuhan 430079, China;
    2. Guangzhou Expressway Co., Ltd., Guangzhou 510320, China;
    3. Road & Bridge International Co., Ltd., Beijing 101107, China
  • Received:2022-10-25 Revised:2023-06-26 Published:2024-03-08
  • Supported by:
    The National Key Research and Development Program of China (No. 2020YFD1100203); The Hubei Natural Science Foundation of China (No. 2021CFA088)

Abstract: At present, under the good condition of image geometry and radiation quality, the technology of 3D scene reconstruction by intensive matching of multi-view aerospace image is relatively mature, which has achieved good results both in accuracy and efficiency. However, when multi-view aerospace images with good geometric conditions are difficult to obtain, the geometric processing methods of classical photogrammetry and computer vision may face great challenges. In this paper, we study this problem and propose a monocular height estimation method of remote sensing image based on Swin Transformer and convolutional neural network (CNN). Swin Transformer is a hierarchical transformer structure with shifted windows. It combines the ability of convolutional neural network to process large scale image and extract multi-scale features, as well as the global information interaction ability of transformer. In addition, our method reformulates the height estimation problem into a classification-regression problem to improve model performance. Specifically, for each input image, our model classifies the height range into several discrete bins adaptively, where continuous height value is estimated via a linear combination of predicted discrete bins and height distribution probability. In experiments, we qualitatively and quantitatively demonstrate that the proposed method outperforms the state-of-the-art approaches, and it can also be applied to highway road construction sites with good generalization.

Key words: intelligent interpretation of remote sensing image, deep learning, monocular height estimation, global information, convolutional neural networks

CLC Number: